kalani's Tech blog: June 2008

Monday, June 30, 2008

Loading Lucene Index to the RAM and Flushing Lucene Updates Periodically - Apache Lucene

As my previous post, Creating Lucene Index in a Database, shows, storing lucene index in the database is a solution for applications run on clustered environments. But there is a performance hit as we read/write from/to the database when the index is updated. It is more time consuming.

Therefore we can simply load the lucene index to the RAM (Lucene supports RAMDirectory) and flush the changes to the database periodically. It can be done as follows.

RAMDirectory ramDir = new RAMDirectory();

JdbcDirectory jdbcDir = new JdbcDirectory(dataSource, new MySQLDialect(), "indexTable");


byte [] buffer = new byte [100] ;

LuceneUtils.copy(jdbcDir, ramDir, buffer); //Copying the JdbcDirectory to RAMDirectory

//After this point we can simply deal with RAMDirectory without bothering about index in the database


//After a convenient time period we can flush the changes in the RAMDirectory to the database

timer.schedule(new FlushTimer(10000,ramDir,jdbcDir), 0, 10000);



public class FlushTimer extends TimerTask{

    private int interval;

    RAMDirectory ramDir;

    JdbcDirectory jdbcDir;


    byte [] buffer = new byte [100] ;


    public FlushTimer (int interval, RAMDirectory ramDir, JdbcDirectory jdbcDir){

        this.interval = interval;

        this.ramDir = ramDir;

        this.jdbcDir = jdbcDir;

    }


    public void run() {

        try{

            jdbcDir.deleteContent();

             LuceneUtils.copy(ramDir, jdbcDir, buffer);

         }catch(Exception e){

             e.printStackTrace();

         }

    }


}

Tuesday, June 24, 2008

Creating Lucene Index in a Database - Apache Lucene

My previous post, Indexing a database and searching the content using Lucene, shows how to index records (or stored files) in a database. In that case the index is created in the local file system. However in real scenarios most of the applications run on clustered environments. Then the problem comes where to create the search index.

Creating the index in the local file system is not a solution for the particular situation as the index should be synchronized and shared by every node. One solution is clustering the JVM while using a Lucene RAMDirectory (keep in mind it disappears after a node failure) instead of a FSDirectory. Terracotta framework can be used to cluster the JVM. This blog entry shows a code snippet.

Anyway I thought not to go that far and decided to create the index in the database so that it can be shared by everyone. Lucence contains the JdbcDirectory interface for this purpose. However the implementation of this interface is not shipped with Lucene itself. I found a third party implementation of that. Compass project provides the implementation of JdbcDirectory. (No need to worry about compass configurations etc. JdbcDirectory can be used with pure Lucene without bothering about Compass Lucene stuff).

Here is a simple example

//you need to include lucene and jdbc jars 
import org.apache.lucene.store.jdbc.JdbcDirectory;

import org.apache.lucene.store.jdbc.dialect.MySQLDialect;

import com.mysql.jdbc.jdbc2.optional.MysqlDataSource;

//code snippet to create index
MysqlDataSource dataSource = new MysqlDataSource();

dataSource.setUser("root");

dataSource.setPassword("password");

dataSource.setDatabaseName("test");

dataSource.setEmulateLocators(true); //This is important because we are dealing with a blob type data field

JdbcDirectory jdbcDir = new JdbcDirectory(dataSource, new MySQLDialect(), "indexTable");

jdbcDir.create(); // creates the indexTable in the DB (test). No need to create it manually

//code snippet for indexing
StandardAnalyzer analyzer = new StandardAnalyzer();

IndexWriter writer = new IndexWriter(jdbcDir, analyzer, true);

indexDocs(writer, dataSource.getConnection());

System.out.println("Optimizing...");

writer.optimize();

writer.close();


static void indexDocs(IndexWriter writer, Connection conn)
throws Exception {
    String sql = "select id, name, color from pet";
    Statement stmt = conn.createStatement();
    ResultSet rs = stmt.executeQuery(sql);

    while (rs.next()) {
        Document d = new Document();
        d.add(new Field("id", rs.getString("id"), Field.Store.YES, Field.Index.NO));
        d.add(new Field("name", rs.getString("name"), Field.Store.YES, Field.Index.TOKENIZED));
        d.add(new Field("color", rs.getString("color"), Field.Store.YES,  Field.Index.TOKENIZED));
        writer.addDocument(d);
    }
}

This is the indexing part. Searching part is same as the one in my previous post.

Thursday, June 05, 2008

Apache Lucene - Indexing a Database and Searching the Content

Here is a Java code sample of using Apache Lucene to create the index from a database. (I am using Lucene version 2.3.2 and MySQL)

final File INDEX_DIR = new File("index");

try{
   Class.forName("com.mysql.jdbc.Driver").newInstance();
   Connection conn = DriverManager.getConnection("jdbc:mysql://localhost/test", "root", "password");
   StandardAnalyzer analyzer = new StandardAnalyzer();
   IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true);
   System.out.println("Indexing to directory '" + INDEX_DIR + "'...");
   indexDocs(writer, conn);
   writer.optimize();
   writer.close();
} catch (Exception e) {
   e.printStackTrace();
}

void indexDocs(IndexWriter writer, Connection conn) throws Exception {
  String sql = "select id, name, color from pet";
  Statement stmt = conn.createStatement();
  ResultSet rs = stmt.executeQuery(sql);
  while (rs.next()) {
     Document d = new Document();
     d.add(new Field("id", rs.getString("id"), Field.Store.YES, Field.Index.NO));
     d.add(new Field("name", rs.getString("name"), Field.Store.NO, Field.Index.TOKENIZED));
     d.add(new Field("color", rs.getString("color"),Field.Store.NO, Field.Index.TOKENIZED));
     writer.addDocument(d);
 }
}

I assumed that there is a table named pet in the "test" database with the fields "id" "name" and "color". After running this a folder named index is created in the working directory including indexed content.

The following code (lucene searcher) shows how to search a record containing a particular keyword using the created lucene index.

Searcher searcher = new IndexSearcher(IndexReader.open("index"));
Query query = new QueryParser("color",analyzer).parse("white");
Hits hits = searcher.search(query);
String sql = "select * from pet where id = ?";

PreparedStatement pstmt = conn.prepareStatement(sql);
for (int i = 0; i < hits.length(); i++){
   id = hits.doc(i).get("id");
   pstmt.setString(1, id);
   displayResults(pstmt);
}

void displayResults(PreparedStatement pstmt) {
   try {
      ResultSet rs = pstmt.executeQuery();
      while (rs.next()) {
         System.out.println(rs.getString("name"));
         System.out.println(rs.getString("color")+"\n");
      }
   } catch (SQLException e) {
      e.printStackTrace();
   }
}