Lucene or Mysql Full text search

10,038

Solution 1

I had the same decision in November 2010. I'm a friend of mysql and tried to build an search application on mysql first - which works well... ...and fast (i thought it was fast): searching in 200.000 documents (in not more than 2-3 second)

I avoided spending time to lucene/solr, because i would like to use this time for developing the application. And: lucene was new for me... I don't know, if it is good enough, i don't know what it is.... Finally: You can't change the habits of a lifetime.

However, i run in different problems with fuzzy search (which is difficult to implement in mysql) or "more like this" (which have to be coded from scrat in an application using mysql or simple use that "more like this" solr-feature out of the box). Finally the number of documents rises up to a million and mysql needs now more than 15 seconds to search into the documents.

So i decided to start with lucene and it feels like i opened a door to a new world. Lot's of features (i hardly coded application-features) are now provided from solr and working out of the box. The fulltext searches are much, much faster: less than 50ms in 1 million Documents, less than 1ms, if it is cached.

So the invested time has paid off.

So if you think about to make an fulltext search: take lucene, if you have mor than a couple of data. By the way: i'm using an hybrid construct: holding the data in mysql and lucene is only an index with (nearly) no stored data (to keep that index small and fast).

Solution 2

generically speaking, if you are going to have full text searches, you will most surely need lucene or sphinx + mysql (or lucene + mysql, storing the indexable fields in lucene, and returning an id for a mysql row). either of them are excellent choices.

if you are going to do "normal" searches (i.e: integer or char columns or date), mysql partitoning will suffice.

you need to specify what are you going to search for. and how often you will be reindexing your db (if you are going to reindex a lot, i'd go with sphinx)

Solution 3

You are asking whether to go with Lucene or MySQL. But Lucene is a library, and MySQL is a server. You should really be deciding between SOLR search engine and MySQL. In that case, the right answer is likely to be both. Manage all the data in MySQL. Run processes to regularly extract changed data, transform it into SOLR search format, and load it into the search engine. Using SOLR is much more straightforward than using Lucene directly, and if you need to modify the behavior in some way, you can still write plugins for SOLR so there is no loss of flexibility.

But it would be the kiss of death to try and manage data with SOLR. The cycle of read-edit-update works great with SQL dbs but it is not what SOLR is all about. SOLR is fast flexible text search. You can stick image URLs in SOLR for convenience of preparing search results using a non-indexed field.

Share:
10,038
smaura777
Author by

smaura777

Updated on June 11, 2022

Comments

  • smaura777
    smaura777 almost 2 years

    Nowadays when starting a web/mobile app project in which search is going to be an important variable. Is it better to go with Lucene from the start or quickly deploy a MySQL based solution and hope for the best?