Boosting Technologies: About Lucene ....

Lucene is a search library .It indexed the files and content as a doc and store temporally after that according the quarry it process and provide the result

first i take you a sample indexing and searching code ..

ohhh....First you need to download the Apache lucene and add as external library in your eclipse project.

then you can see the below java code

import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.CorruptIndexException;

import org.apache.lucene.index.IndexReader;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.IndexWriterConfig;

import org.apache.lucene.queryParser.MultiFieldQueryParser;

import org.apache.lucene.queryParser.ParseException;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.ScoreDoc;

import org.apache.lucene.search.TopScoreDocCollector;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.RAMDirectory;

import org.apache.lucene.util.Version;

public class simpleSearch {

public static void main(String[] args) throws IOException, ParseException {

// 0. Specify the analyzer for tokenizing text.

// The same analyzer should be used for indexing and searching

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);

// 1. create the index

Directory index = new RAMDirectory();

IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);

IndexWriter w = new IndexWriter(index, config);

addField(w,"ravi","23","colombo");

addField(w,"sasi","23","kandy");

addField(w,"kamal","45","jaffna");

w.close();

// 2. query

String querystr = "J*";

String[] s = new String[]{"Name","Age","Address"};

Query q =new MultiFieldQueryParser(Version.LUCENE_35,s, analyzer).parse(querystr);

// 3. search

int hitsPerPage = 10;

IndexReader reader = IndexReader.open(index);

IndexSearcher searcher = new IndexSearcher(reader);

TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);

searcher.search(q, collector);

ScoreDoc[] hits = collector.topDocs().scoreDocs;

// 4. display results

System.out.println("Found " + hits.length + " hits.");

for(int i=0;i<hits.length;++i) {

int docId = hits[i].doc;

Document d = searcher.doc(docId);

for (int t=0;t<s.length;t++){

System.out.println((i + 1) + ". " + d.get(s[t]));

}

// searcher can only be closed when there

// is no need to access the documents any more.

searcher.close();

}

private static void addField(IndexWriter w, String name, String age,String address) throws CorruptIndexException, IOException {

Field senderNameField = new Field("Name", name,

Field.Store.YES,

Field.Index.ANALYZED);

Field subjectField = new Field("Age", age,

Field.Store.YES,

Field.Index.ANALYZED);

Field addressField = new Field("Address", address,

Field.Store.YES,

Field.Index.ANALYZED);

Document doc=new Document();

doc.add(subjectField);

doc.add(senderNameField);

doc.add(addressField);

w.addDocument(doc);

}

Print :

Found 1 hits.

1. kamal

1. 45

1. jaffna

Indexing classes

First i take you the important core indexing classes .These are the five main index class provide by the lucene

listed below :

IndexWriter
Document
Analyzer
Directory
Field

IndexWriter- This class is used for the index the new or existing document and also it use for add , remove and update the document. After index it want to a place to store the new indexed document so it used the Directory class for that .

In the example :

IndexWriter w = new IndexWriter(index, config);

Document - It represent collection of the field as virtual document.It is a text you extract from the binary document and add as a field.

In the example :

Document doc=new Document();

doc.add(subjectField);

doc.add(senderNameField);

doc.add(addressField);

Analyzer - It is a abstract class and it passed to the index writer constructor.And it deal with the provide the token to text.

In the example :

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);

Directory- It is a abstract class and it store the indexed document and it have some other sub classes listed below

SimpleFSDirectory
NIOFEDirectory
MMapDirectory
RAMDirectory
FileSwitchDirectory

In the example:

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);

Field - In a document have several field use this class and add the values and bunch of options to the each field.and if you want to give the priority for the field then we can

senderNameField.setBoost(1.5f);

then priority of searching result for goes to the sender name for an example if we have name "Daya" and address "Dehiwala" we seach the "D*" this one the first result should be "Daya"

In the example :

Field senderNameField = new Field("Name", name,

Field.Store.YES,

Field.Index.ANALYZED);

Field subjectField = new Field("Age", age,

Field.Store.YES,

Field.Index.ANALYZED);

Searching classes

Now we go through the searching classes it has some core class listed below :

IndexSearcher
Query
TopDocs

IndexSearcher - first we need to open the indexed document use the index reader class after that pass the object to InndexSearcher constructor.It used for searching the index document

In the example :

IndexReader reader = IndexReader.open(index);

IndexSearcher searcher = new IndexSearcher(reader);

Query - Query class has several concrete classes those are listed below

BooleanQuery
PhraseQuery
PrefixQuery
PhrasePrefixQuery
TermRangeQuery
NumericRangeQuery
FilteredQuery
SpanQuery

In the example :

String querystr = "J*";

String[] s = new String[]{"Name","Age","Address"};

Query q =new MultiFieldQueryParser(Version.LUCENE_35,s, analyzer).parse(querystr);

Things should be made as simple as possible, but not any simpler.
-Albert Einstein

Pages

Boosting Technologies

Ads 468x60px

wibiya widget

Blogroll

Blogger templates

Blogger news

Saturday, June 9, 2012

About Lucene ....

Indexing classes

Searching classes

0 comments:

Post a Comment

Labels

Twitter

Feedjit

Blog Archive

Popular Posts

About Me

Followers