final File INDEX_DIR = new File("index"); try{ Class.forName("com.mysql.jdbc.Driver").newInstance(); Connection conn = DriverManager.getConnection("jdbc:mysql://localhost/test", "root", "password"); StandardAnalyzer analyzer = new StandardAnalyzer(); IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true); System.out.println("Indexing to directory '" + INDEX_DIR + "'..."); indexDocs(writer, conn); writer.optimize(); writer.close(); } catch (Exception e) { e.printStackTrace(); } void indexDocs(IndexWriter writer, Connection conn) throws Exception { String sql = "select id, name, color from pet"; Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery(sql); while (rs.next()) { Document d = new Document(); d.add(new Field("id", rs.getString("id"), Field.Store.YES, Field.Index.NO)); d.add(new Field("name", rs.getString("name"), Field.Store.NO, Field.Index.TOKENIZED)); d.add(new Field("color", rs.getString("color"),Field.Store.NO, Field.Index.TOKENIZED)); writer.addDocument(d); } }I assumed that there is a table named pet in the "test" database with the fields "id" "name" and "color". After running this a folder named index is created in the working directory including indexed content.
The following code (lucene searcher) shows how to search a record containing a particular keyword using the created lucene index.
Searcher searcher = new IndexSearcher(IndexReader.open("index")); Query query = new QueryParser("color",analyzer).parse("white"); Hits hits = searcher.search(query); String sql = "select * from pet where id = ?"; PreparedStatement pstmt = conn.prepareStatement(sql); for (int i = 0; i < hits.length(); i++){ id = hits.doc(i).get("id"); pstmt.setString(1, id); displayResults(pstmt); } void displayResults(PreparedStatement pstmt) { try { ResultSet rs = pstmt.executeQuery(); while (rs.next()) { System.out.println(rs.getString("name")); System.out.println(rs.getString("color")+"\n"); } } catch (SQLException e) { e.printStackTrace(); } }
70 comments:
Sorry but are there any codes there thats written in c#?
cause im doing the database indexing and searching using c# and asp.net
please reply me via mail to raypang20@yahoo.com if can
thanks
Good!, It's really intetesting document.
One question.. I have to update and delete documents to de lucene index when they are deleted from database. How can I do that??
I'm very confused about it...
Best Regards
Of course you can. You should maintain a unique id (may be the database primary key)for a document and it must be tokenized when indexing(document.add(new Field("id", id, Field.Store.YES, Field.Index.TOKENIZED));). Before you delete the record from the database you can call the IndexReader to delete the relevant document by giving a "Term" created with the unique id. Make sure that IndexWriter is closed before deleting.
If you need to know further just drop a reply.
This is for raypang - you can check this out. Sorry I am not much familiar.
Hm..nice explanation!
I haven't used any searching/indexing technology like Lucene before.
But I have an architectureal issue? When we want to use a searching frameworking like Lucene , we have to implement a seperate layer (lets call it an 'Indexing layer') which connects with database layer (say Hibernate). Is that so?
I was wondering instead of doing search using direct database access, what are other advantages using a framework like Lucene ? (Other than performance?).Because we have to maintainthe persistance between 'Indexing Layer' and 'Database Layer'?
ok..i found the solution .. HibernateSearch :) anyway thanks for the nice tutorial :)
Kalani,
thank you for your post. it is really hard to find info about Lucene database indexing. i have a question about insertion and deletion record from database.
So you're saying that i need to run/reindex on every inserts and delete. if the table is huge, doesn't re-indexing take a long time. is this the normal practice. how do you handle this in a web environment.
your help is appriciated. Could you also please provide code snippet on these operations
thanks
Hi,
I think you don't have to run it on every insert/delete. You can have a background thread which runs periodically for the indexing purpose.
hey...
thanks Kalani....
very useful blog..... :)
Hi Kalyani,
Its nice to see your code..i can index the database but searching is lilbit confusing..
DataBase Indexing Link:
http://pastebin.com/f7b63a4cc
i am facing issues when i search from the indexed db:
can you please refer my code once..
Link:http://pastebin.com/f3ac3a18e
regards,
Bharath
from india
bharath.ambat@gmail.com
Hi Kalani,
you said that we should maintain a unique id (may be the database primary key)for a document, what if I have multiple tables JOINED in the query. What to do in case of JOINS ?
do you have a example ...sort of...?
Any help is greatly appreciated.
Thanks,
--Gaurav
Hi,
I'm wondering if you can adapt your approach in order to index a wso2 registry? Do you have any examples for that?
Regards,
Lars
Kalani ,
We need to index 20-30 million records daily. As of now Lucene is indexing million records in 15 mins . but it doesnt match to the speed we want .Could you please let me know any optimization techniques to speed up my query . Its straigh forward selection from Data Base.I think indexing is taking very less time . The Accessing part is taking more time . Any inputs are higly appreciated.Could you please mail me to vijaykumar.ravva@gmail.com .Thanks Vijay
Hello
I' tried the code and I get this error
"""
cannot find symbol symbol : constructor IndexWriter(java.io.File,org.apache.lucene.analysis.standard.StandardAnalyzer,boolean)
location: class org.apache.lucene.index.IndexWriter
IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true"""
I' a newbie ...thank u
thanks
Sorry guys I didn't get mail notifications for these last comments.
@Just for Time Pass: Did you figure out your problem? What's the error you were getting when searching? at the same time can you make sure that the indexing part is done correctly?
@Gaurav: I don't think there's any meaning for "indexing the query". We are indexing tables. Not query results. Even if we did somehow there's no mapping back.
What you exactly wanted to do?
@Marius: Hi, can you just change this line in ur code?
IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true"""
It should be
IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true);
Hello Kalani
how is it with blob?
can i index like this
d.add(new Field("text", rs.getBlob( "text" ), Field.Store.NO, Field.Index.ANALYZED));??
how exacty should i proceed?
thanks
Hi Lucius, I guess you have to convert blob to a string before indexing.
http://shitmores.blogspot.com/2007/06/convert-javasqlblob-to-string.html
hai kalani,
I have benefited a lot from this blog....
thank you very much.
can you please add me,
gowribts@gmail.com
Hi,Kalani
a good blog for lucene api
it helps me a lot
Thank you
Please refer below link for improving indexing and searching speed
http://wiki.apache.org/lucene-java/BasicsOfPerformance
Hi!
Sorry to bother you with this! I'm new in working with lucene and for the experienced users this may be a silly question but I can't figure what is wrong: "Field.Store.YES" and "Field.Index.TOKENIZED" are not recognized. Am I missing any library or any class? Any answer will be greatly appreciated.
Thank you.
Hi Cornel, what is the Lucene version you use. You may be using a newer version than 2.3.2 If so it may be changed.
Thank you kalani!
Indeed I'm using v.3 so probably this is why it is not working.
Hi,
I am new to Lucene. I have multiple tables, and need to index multiple tables. Is there any way in Lucene, where in indexing I can differentiate and index these tables and also while searching can search based on the same. A small piece of code might be really helpful. Please help at the earliest.
here i am attaching a sample program
in which i have discrib how to index data of 2 tables
i have taken static values here
but you can take that values from the database and store that
after that i have made a search for
"name" from the first table
import java.io.File;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;
public class LuceneTest{
public static void main(String[] args){
try{
File tempFile = new File("E:\\temp");
Directory INDEX_DIR = new SimpleFSDirectory(tempFile);
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
writer.setRAMBufferSizeMB(48);
writer.setUseCompoundFile(false);
// in this below code we have added data for first table which contains id and name
Document doc = new Document();
doc.add(new Field("tab1_id","1",Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc.add(new Field("tab1_name","jaydatt",Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
writer.addDocument(doc);
doc.getField("tab1_id").setValue("2");
doc.getField("tab1_name").setValue("jay");
writer.addDocument(doc);
//ends entry for first table
//now here is the second table in which we have added id and desgnation of the user
doc = new Document();
doc.add(new Field("tab2_id","1",Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc.add(new Field("tab2_field","SE",Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
writer.addDocument(doc);
doc.getField("tab2_id").setValue("2");
doc.getField("tab2_field").setValue("JSE");
writer.addDocument(doc);
//ends entry for second table
writer.optimize();
writer.close();
analyzer.close();
new LuceneTest().search();
}catch(Exception e){
e.printStackTrace();
}
}
public void search(){
try{
//in this search method we will fetch the data for the user from first table
File tempFile = new File("E:\\temp");
Directory INDEX_DIR = new SimpleFSDirectory(tempFile);
Query q = new QueryParser(
Version.LUCENE_CURRENT, "tab1_name", new StandardAnalyzer(Version.LUCENE_CURRENT)).parse("jaydatt");
TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);
IndexSearcher searcher = new IndexSearcher(INDEX_DIR, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
for(int i=0;i<hits.length;++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println(d.get("tab1_id"));
System.out.println(d.get("tab1_name"));
}
searcher.close();
}catch(Exception e){
e.printStackTrace();
}
}
}
Ms. shweta please refer the above post
Thanks jaydatt...
hi...
I am caught up with one scenario. Just need help in how to go about it. There are two fields in my database for start date and end date, which I have indexed using Lucene. And I need to filter out records, based on date given by any user, the result should show those records in which this date lies between start and end date. Can anyone show me a sample program, on how to accomplish this. This will be really helpful to me. Thanks..
shweta: just go through BooleanQuery class in lucene api
Jaydatt: using BooleanQuery class, I can join the queries but how do I compare the given date with start and end date, i.e. >= and <=. If you can provide a small piece of code, that will be really helpful. Please help at the earliest.
shweta: you can use NumericRangeQuery for that
you can index your date field like:
doc = new Document();
//System.out.println(new SimpleDateFormat("dd-MM-yyyy").parse("02-07-2010").getTime());
//long value of 02-07-2010
doc.add(new NumericField("date",Field.Store.YES, true)
.setLongValue(1278009000000L));
writer.addDocument(doc);
//System.out.println(new SimpleDateFormat("dd-MM-yyyy").parse("03-07-2010").getTime());
//long value of 03-07-2010
((NumericField)doc.getFieldable("date"))
.setLongValue(1278095400000L);
writer.addDocument(doc);
//System.out.println(new SimpleDateFormat("dd-MM-yyyy").parse("04-07-2010").getTime());
//long value of 04-07-2010
((NumericField)doc.getFieldable("date")).setLongValue(1278181800000L);
writer.addDocument(doc);
here 3 dates are indexed now suppose you want to search between first 2 dates then search using NumericRangeQuery
e.g.
NumericRangeQuery q = NumericRangeQuery.newLongRange("date",
1278009000000L,1278095400000L,true,true);
so in NumericRangeQuery
if you want date range >= 1278009000000L
then you can use
NumericRangeQuery q = NumericRangeQuery.newLongRange("date",1278009000000L,null,true,true);
where null is the upper limit
and it will give you all date ranges >= 1278009000000L(i.e. 02-07-2010)
same you can search for <=
thanks jaydatt... but I am not able to try this..., as my database records have other fields as well which are not numeric. Can you please suggest how to add document with Field and NumericField both. A small sample piece of code will suffice. Please help early...
Nice little tutorial,
Helped alot!
Nice Tutorial.. :)
Is there a way to update the newly inserted records in the index ? and delete the deleted records from the index ?
Hi Rajgopal, I guess you have to call indexing/deleting feature, whenever you insert/delete a record through your program.
thanks kalani Ruwanpathirana
sample code helped me lot
have any lucene websearch sample code
plz reply me via to baranikumar_v@hotmail.com
advance thanks
y kalani i'm a computer science student i have to work with lucene but to index information in a database and i was asking if you could help me just by giving me step by step information if this is ok i will give you my email to answer me and i will find a way to contact you thank you
email:mioo2@hotmail.fr
hey kalani mam i am a student of information technology an i have a project on searching some thing
i have data of 37.1 GB. data file contains features of star.this data contains data for 300 million star
one star is totally define by 45 attributes all attributes contains numerical value(in double).
i want to search into whole data for a specific star given two attributes result must contains all 45 attributes of that star
i want to make database for these star and want to search by apache lucene
i want to search this data in fraction of milisecond so i want to use apache lucene API
do i on my rigth way.if yes
then suggest what are the necessary step i should take for successful completion of this project
like
1. you have to make a data base first and then....................................
please give me some basic idea about this my email ID is santosh.iiita2007@gmail.com
Hi Kalani, I am a Btech Student and want to implement Lucene in one application as one of my project.
I am very new to lucene and don't even what it is expect that it is a full-text search based library.
Can you please help me with this? I have to complete this project in 1 month. So please help me out.
Your help will be fully appreciated.
My email ID is
pranavgoyal40341@gmail.com
Can anyone please tell me while I make an index from my database what should be my field values?
How should I decide which fields value I should keep.
Is the field values should be same as that of my table column headers or it should be something else?
Hi Kalani,
Thanks for your guide on creating lucene indexes.
I followed your post and written a program for a desktop app which uses sqllite. data is static.
My problem here is, i am unable to search for any freetext. for any text search i end up getting 0 hits :(
Here is my code...
File indexFile = new File("C:/IndLaw/index");
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
FSDirectory INDEX_DIR = new SimpleFSDirectory(indexFile);
if(!indexFile.exists() || !(indexFile.list().length>0) ) {
createIndexesFromDB(INDEX_DIR, analyzer);
}
System.out.println("Enter the content to be serached: ");
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String s = br.readLine();
int hitsPerPage = 10000;
IndexSearcher searcher = new IndexSearcher(INDEX_DIR, true);
Query query = new QueryParser(Version.LUCENE_CURRENT, "c1judgmenttext", analyzer).parse(s);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
System.out.println("Found " + hits.length + " hits.");
String sql = "select C.docid, C.c0caseid, C.c1judgmenttext from casejudgments_content C where C.docid = ?";
PreparedStatement pstmt = connection.prepareStatement(sql);
for (int i = 0; i < hits.length; i++){
int id = hits[i].doc;
pstmt.setInt(1, id);
displayResults(pstmt);
}
private static void createIndexesFromDB(FSDirectory index_dir, StandardAnalyzer analyzer) throws Exception {
// Load the sqlite-JDBC driver
//Class.forName("SQLite.JDBCDriver");
Class.forName("org.sqlite.JDBC");
// create a database connection
connection = DriverManager.getConnection("jdbc:sqlite:C:/IndLaw/SysDb.db");
IndexWriter writer = new IndexWriter(index_dir, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
indexDocs(writer, connection);
writer.optimize();
writer.close();
}
private static void indexDocs(IndexWriter writer, Connection connection) throws Exception {
String sql = "select C.docid, C.c0caseid, C.c1judgmenttext from casejudgments_content C where C.docid<=5";
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery(sql);
while (rs.next()) {
Document d = new Document();
d.add((Fieldable) new Field("docid", rs.getString("docid"), Field.Store.YES, Field.Index.NO));
d.add((Fieldable) new Field("c1judgementtext", rs.getString("c1judgmenttext"),Field.Store.NO, Field.Index.ANALYZED));
writer.addDocument(d);
}
}
Sorry for being late to reply you all.
@Santu - As I understood, I don't think it is a must to have a database. You can index the file directly and then search. in your case it looks like you have to do your own filtering too, because you are providing two attributes and searching stars(well, again if I understood correctly).
@Pranav - This may help you out. http://www.lucenetutorial.com/lucene-in-5-minutes.html
@Monty - Your field names need not to be the same as column headers.
This should work too (but use the same field name when searching)
d.add(new Field("namefield", rs.getString("name"), Field.Store.NO, Field.Index.TOKENIZED));
You should decide which columns you need to consider for searching and make index fields for those columns.
@Sonu - One thing I noticed is your field names in indexing and searching are different. When you index you use "c1judgementtext" but when you search you use "c1judgmenttext"('e' is missing here). That may be the case. Just correct the spellings and see.
I want to lucene with liferay portlet . I want to connection database mysql . help me ?
I want to search lucene with portlet . Connect database mysql . Help me
Hi Experts,
Their is change in version. I am using Lucene v3.3, do we have any updated code for this as some of the methods call are generating errors like doc.add() stuff. Kindly help.
Can you tell me how to create the object with Lucene 3.4 for IndexWriter.
hi kalani am srikanth new to java and your posts are very much helpful to me .I am facing a problem i have to index the XML files but we will be doing lot of manipulations on the xmls so i have to re-index the entire xml every time so can you suggest me the way to update the existing indexing instead of reindexing so please help me
hi kalani, i am very novice on how to use lucene to search things from DB then can u help to create new post on how to install lucene and intergrate it with apache server and mysql db?
Kalani,
Thanks for the above post.
It helped me a lot to understand the lucene DB indexing & searching.
Friends, also check this http://ikaisays.com/2010/04/24/lucene-in-memory-search-example-now-updated-for-lucene-3-0-1/ to know more on this topic.
Thx,
Rakesh M S
do you have examples for lucene 3.6?
KALANI RUWANPATHIRANA,Please reply me on 17sanjayverma@gmail.com.
My Hits length coming zero, so loops is not running. Please mam have look at ,y code below and rely me soon. Its urgent.
package app;
import java.io.File;
import java.sql.*;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.search.TopDocCollector;
public class MyClass {
final File INDEX_DIR = new File("index");
StandardAnalyzer analyzer;
public Connection conn;
static String keyword=null;
JavaBean jb=new JavaBean();
public JavaBean getJb() {
return jb;
}
public void setJb(JavaBean jb) {
this.jb = jb;
}
boolean status;
String getvalue;
public boolean makeConnection(String ss )
{
getvalue=ss;
try{
Class.forName("com.mysql.jdbc.Driver");
conn = DriverManager.getConnection("jdbc:mysql://localhost:3307/java", "root", "password");
System.out.println("Connection has been made");
analyzer = new StandardAnalyzer();
IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true);
System.out.println("Indexing to directory '" + INDEX_DIR + "'...");
indexDocs(writer, conn);
writer.optimize();
writer.close();
return status=true;
}
catch (Exception e)
{
e.printStackTrace();
System.out.println(e.getMessage());
return status=false;
}
}
public void indexDocs(IndexWriter writer, Connection conn) throws Exception
{
String sql = "select * from content";
Statement stmt = conn.createStatement();
ResultSet rs=stmt.executeQuery(sql);
while (rs.next())
{
System.out.println(rs.getString("news"));
Document d = new Document();
d.add(new Field("id", rs.getString("id"), Field.Store.YES, Field.Index.NO));
d.add(new Field("name", rs.getString("name"), Field.Store.NO, Field.Index.TOKENIZED));
d.add(new Field("news", rs.getString("news"),Field.Store.NO, Field.Index.TOKENIZED));
writer.addDocument(d);
}
Searcher searcher = new IndexSearcher(IndexReader.open("index"));
Query query = new QueryParser("news", analyzer).parse(getvalue);
Hits hits = searcher.search(query);
System.out.println(hits.length());
PreparedStatement pstmt = conn.prepareStatement("select * from content where id = ?");
for (int i = 0; i < hits.length(); i++)
{
String id = hits.doc(i).get("id");
pstmt.setString(1, id);
displayResults(pstmt);
}
}
void displayResults(PreparedStatement pstmt) {
try {
ResultSet rs = pstmt.executeQuery();
while (rs.next()) {
System.out.println(rs.getString("name"));
System.out.println(rs.getString("news")+"\n");
}
} catch (SQLException e) {
e.printStackTrace();
}
}
}
Fantastic post Kalani - thanks!
I didn't know indexes very well, this is other helpful post:
how do database indexes work
Hi,
How can i search in entire data base using Apache Lucene in MySQL.
Can i have any example related to this..?
Thanks
Anush
Hi,
This is Anush.
How can i use index search in entire data base(MySQL) using Apache Lucene.
Can i have related examples.?
Thanks
Anush
what version of lucene did u use???
Hi Kalani, Thank you for kind information. I want to search text or data from different data bases so can you please help me out with code. Thank you so much in advance.
Hi Kalani, can you please upload the screenshots of the project structure. I am new to lucene and I am not clear about how & where to add which file. Can you please help me out here???
Hi Kalani. I am new to Lucene features. Can we index a Database containing 11 tables.?
or I have to index each and every table separately. What would be the efficient process so that I can search for a perticular entry in all the tables as I have to display every row where that entry is present
hi kalani , Iam new to lucene . How do I index a database having multiple tables ?
This is a great post. I like this topic.This site has lots of advantage.I found many interesting things from this site. It helps me in many ways.Thanks for posting this again
internship for web development | internship in electrical engineering | mini project topics for it 3rd year | online internship with certificate | final year project for cse
additional reading gucci replica handbags official source additional resources my explanation dolabuy.ru
Post a Comment