Subscribe:

Ads 468x60px

wibiya widget

Blogroll

Blogger templates

Blogger news

Saturday, June 9, 2012

About Lucene ....

Lucene is a search library .It indexed the files and content as a doc and store temporally after that according the quarry it process and provide the result

first i take you a sample indexing and searching code ..

ohhh....First you need to download the Apache lucene and add as external library in your eclipse project.


then you can see the below java code 

import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;

public class simpleSearch {
public static void main(String[] args) throws IOException, ParseException {

   // 0. Specify the analyzer for tokenizing text.
   //    The same analyzer should be used for indexing and searching 
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
    
   // 1. create the index

   Directory index = new RAMDirectory();
   IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);
   IndexWriter w = new IndexWriter(index, config);
   addField(w,"ravi","23","colombo");
   addField(w,"sasi","23","kandy");   
   addField(w,"kamal","45","jaffna"); 
   w.close(); 
   
   // 2. query
   String querystr =  "J*"; 
     String[] s = new String[]{"Name","Age","Address"};
    Query  q =new MultiFieldQueryParser(Version.LUCENE_35,s, analyzer).parse(querystr);
  
   // 3. search
   int hitsPerPage = 10;
   IndexReader reader = IndexReader.open(index);
   IndexSearcher searcher = new IndexSearcher(reader);
   TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
   searcher.search(q, collector);
   ScoreDoc[] hits = collector.topDocs().scoreDocs;
  
   // 4. display results
   System.out.println("Found " + hits.length + " hits.");
   for(int i=0;i<hits.length;++i) {
     int docId = hits[i].doc;        
     Document d = searcher.doc(docId);
     for (int t=0;t<s.length;t++){
    System.out.println((i + 1) + ". " + d.get(s[t]));
     }
   }

   // searcher can only be closed when there
   // is no need to access the documents any more.
   searcher.close();
 }


private static void addField(IndexWriter w, String name, String age,String address) throws CorruptIndexException, IOException {
 Field senderNameField = new Field("Name", name,
               Field.Store.YES,
               Field.Index.ANALYZED);
  
Field subjectField = new Field("Age", age,
            Field.Store.YES,
            Field.Index.ANALYZED);
Field addressField = new Field("Address", address,
             Field.Store.YES,
             Field.Index.ANALYZED);
 
Document doc=new Document();
doc.add(subjectField);
doc.add(senderNameField);
doc.add(addressField);
w.addDocument(doc);
 }
}


Print :

Found 1 hits.
1. kamal
1. 45
1. jaffna



Indexing classes

First i take you the important core indexing classes .These are the five main index class provide by the lucene 
listed below :
  1. IndexWriter
  2. Document
  3. Analyzer 
  4. Directory 
  5. Field
IndexWriter- This class is used for the index the new or existing document and also it use for add , remove and update the document. After index it want to a place to store the new indexed document so it used the Directory class for that .

In the example :

IndexWriter w = new IndexWriter(index, config);

Document - It represent collection of the field as virtual document.It is a text you extract from the binary document and add as a field.

In the example :

Document doc=new Document();
doc.add(subjectField);
doc.add(senderNameField);
doc.add(addressField);



Analyzer - It is a abstract class and it passed to the index writer constructor.And it deal with the provide the token to text.

In the example :

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);

Directory- It is a abstract class and it store the indexed document and it have some other sub classes listed below 
  • SimpleFSDirectory 
  • NIOFEDirectory 
  • MMapDirectory
  • RAMDirectory
  • FileSwitchDirectory
In the example:

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);

Field - In a document have several field use this class and add the values and bunch of options to the each field.and if you want to give the priority for the field then we can 

senderNameField.setBoost(1.5f); 

then priority of searching result for goes to the sender name for an example if we have name "Daya" and address "Dehiwala" we seach the "D*" this one the first result should be "Daya"

In the example :


Field senderNameField = new Field("Name", name,
                Field.Store.YES,
                Field.Index.ANALYZED);
   
Field subjectField = new Field("Age", age,
             Field.Store.YES,
             Field.Index.ANALYZED);

Searching classes 

Now we go through the searching classes it has some core class listed below :
  1. IndexSearcher
  2. Query
  3. TopDocs
IndexSearcher - first we need to open the indexed document use the index reader class after that pass the object to InndexSearcher constructor.It used for searching the index document 

In the example :

IndexReader reader = IndexReader.open(index);
 IndexSearcher searcher = new IndexSearcher(reader);

Query - Query class has several concrete classes those are listed below 
  • BooleanQuery
  • PhraseQuery
  • PrefixQuery
  • PhrasePrefixQuery
  • TermRangeQuery
  • NumericRangeQuery
  • FilteredQuery
  • SpanQuery
In the example :

String querystr =  "J*"; 
String[] s = new String[]{"Name","Age","Address"};
Query  q =new MultiFieldQueryParser(Version.LUCENE_35,s, analyzer).parse(querystr);












Things should be made as simple as possible, but not any simpler.
-Albert Einstein

0 comments:

Post a Comment