Lucene is a search library .It indexed the files and content as a doc and store temporally after that according the quarry it process and provide the result
first i take you a sample indexing and searching code ..
ohhh....First you need to download the
Apache lucene and add as external library in your eclipse project.
then you can see the below java code
import java.io.IOException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;
public class simpleSearch {
public static void main(String[] args) throws IOException, ParseException {
// 0. Specify the analyzer for tokenizing text.
// The same analyzer should be used for indexing and searching
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
// 1. create the index
Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);
IndexWriter w = new IndexWriter(index, config);
addField(w,"ravi","23","colombo");
addField(w,"sasi","23","kandy");
addField(w,"kamal","45","jaffna");
w.close();
// 2. query
String querystr = "J*";
String[] s = new String[]{"Name","Age","Address"};
Query q =new MultiFieldQueryParser(Version.LUCENE_35,s, analyzer).parse(querystr);
// 3. search
int hitsPerPage = 10;
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
// 4. display results
System.out.println("Found " + hits.length + " hits.");
for(int i=0;i<hits.length;++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
for (int t=0;t<s.length;t++){
System.out.println((i + 1) + ". " + d.get(s[t]));
}
}
// searcher can only be closed when there
// is no need to access the documents any more.
searcher.close();
}
private static void addField(IndexWriter w, String name, String age,String address) throws CorruptIndexException, IOException {
Field senderNameField = new Field("Name", name,
Field.Store.YES,
Field.Index.ANALYZED);
Field subjectField = new Field("Age", age,
Field.Store.YES,
Field.Index.ANALYZED);
Field addressField = new Field("Address", address,
Field.Store.YES,
Field.Index.ANALYZED);
Document doc=new Document();
doc.add(subjectField);
doc.add(senderNameField);
doc.add(addressField);
w.addDocument(doc);
}
}
Print :
Found 1 hits.
1. kamal
1. 45
1. jaffna
Indexing classes
First i take you the important core indexing classes .These are the five main index class provide by the lucene
listed below :
- IndexWriter
- Document
- Analyzer
- Directory
- Field
IndexWriter- This class is used for the index the new or existing document and also it use for add , remove and update the document. After index it want to a place to store the new indexed document so it used the Directory class for that .
In the example :
IndexWriter w = new IndexWriter(index, config);
Document - It represent collection of the field as virtual document.It is a text you extract from the binary document and add as a field.
In the example :
Document doc=new Document();
doc.add(subjectField);
doc.add(senderNameField);
doc.add(addressField);
Analyzer - It is a abstract class and it passed to the index writer constructor.And it deal with the provide the token to text.
In the example :
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
Directory- It is a abstract class and it store the indexed document and it have some other sub classes listed below
- SimpleFSDirectory
- NIOFEDirectory
- MMapDirectory
- RAMDirectory
- FileSwitchDirectory
In the example:
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
Field - In a document have several field use this class and add the values and bunch of options to the each field.and if you want to give the priority for the field then we can
senderNameField.setBoost(1.5f);
then priority of searching result for goes to the sender name for an example if we have name "Daya" and address "Dehiwala" we seach the "D*" this one the first result should be "Daya"
In the example :
Field senderNameField = new Field("Name", name,
Field.Store.YES,
Field.Index.ANALYZED);
Field subjectField = new Field("Age", age,
Field.Store.YES,
Field.Index.ANALYZED);
Searching classes
Now we go through the searching classes it has some core class listed below :
- IndexSearcher
- Query
- TopDocs
IndexSearcher - first we need to open the indexed document use the index reader class after that pass the object to InndexSearcher constructor.It used for searching the index document
In the example :
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
Query - Query class has several concrete classes those are listed below
- BooleanQuery
- PhraseQuery
- PrefixQuery
- PhrasePrefixQuery
- TermRangeQuery
- NumericRangeQuery
- FilteredQuery
- SpanQuery
In the example :
String querystr = "J*";
String[] s = new String[]{"Name","Age","Address"};
Query q =new MultiFieldQueryParser(Version.LUCENE_35,s, analyzer).parse(querystr);
Things should be made as simple as possible, but not any simpler.
-Albert Einstein