I have been recently tasked with looking into upgrading the libraries we used for full text search on a site that I help maintain. Search has become a pretty integral part of our site and performance has gotten worse over time. We currently use Compass to power the full text searches, but it is no longer being maintained. At the core of compass is Lucene which is a very good search engine. Because of Compass we’re stuck with an older version of Lucene and so can’t use all the goodies that are in newer releases.
In my search an alternative I’ve reviewed elasticsearch, Solr, Hibernate Search, and using Lucene directly. I eliminated elasticsearch and Solr based on the integration that is necessary. While they can do what we need it would be a lot of extra work, more than is necessary to make them work. So I’ve spent the last couple of days pouring over any documentation on Hibernate Search and Lucene. I already have a lot of experience with Lucene already and may of our queries already build lucene queries directly and pass it to the Compass APIs. Hibernate Search has been on my radar for a while but I knew very little about it.
While Hibernate Search is a very good framework, I’m going to end up recommending using Lucene directly. Hibernate Search is good for applications that want to provide Full Text Search for database applications (sites). However there are many things that we index that are not stored in any database and do not have entities that represent the searchable data. In my opinion it would be the same amount of work to make Hibernate Search work with our site as it would to work with Lucene directly.
Now I just need to figure out how we’re going to handle distributing the index between our various nodes. I’m still researching various strategies, such as Hadoop, Katta, various strategies using JMS, NFS and more. Hopefully I’ll have a plan mapped out relatively soon.