Choosing a stand-alone full-text search server: Sphinx or SOLR?

56,684

Solution 1

I've been using Solr successfully for almost 2 years now, and have never used Sphinx, so I'm obviously biased. However, I'll try to keep it objective by quoting the docs or other people. I'll also take patches to my answer :-)

Similarities:

  • Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.
  • Both have a long list of high-traffic sites using them (Solr, Sphinx)
  • Both offer commercial support. (Solr, Sphinx)
  • Both offer client API bindings for several platforms/languages (Sphinx, Solr)
  • Both can be distributed to increase speed and capacity (Sphinx, Solr)

Here are some differences:

Related questions:

Solution 2

Unless you need to extend the search functionality in any proprietary way, Sphinx is your best bet.

Sphinx advantages:

  1. Development and setup is faster
  2. Much better (and faster) aggregation. This was the killer feature for us.
  3. Not XML. This is what ultimately ruled out Solr for us. We had to return rather large result sets (think hundreds of results) and then aggregate them ourselves since Solr aggregation was lacking. The amount of time to serialize to and from XML just absolutely killed performance. For small results sets though, it was perfectly fine.
  4. Best documentation I've seen in an open source app

Solr advantages:

  1. Can be extended.
  2. Can hit it directly from a web app, i.e., you can have autocomplete-like searches hit the Solr server directly via AJAX.

Solution 3

Note: There are many users with the same question in mind.

So, to answer to the point:

Which and why?

  • Use Solr if you intend to use it in your web-app(example-site search engine). It will definitely turn out to be great, thanks to its API. You will definitely need that power for a web-app.

  • Use Sphinx if you want to search through tons of documents/files real quick. It indexes real fast too. I would recommend not to use it in an app that involves JSON or parsing XML to get the search results. Use it for direct dB searches. It works great on MySQL.

Alternatives

Although these are the giants, there are plenty more. Also, there are those that use these to power their custom frameworks. So, i would say that you really haven't missed any. Although there is one elasticsearch that has a good user base.

Solution 4

I have been using Sphinx for almost a year now, and it has been amazing. I can index 1.5 million documents in about a minute on my MacBook, and even quicker on the server. I am also using Sphinx to limit searches to places within specific latitudes & longitudes, and it is very fast. Also, how results are ranked is very tweakable. Easy to install & setup, if you read a tutorial or two. Almost 1.0 status, but their Release Candidates have been rock solid.

Solution 5

Lucene / Solr appears to be more featured and with longer years in business and a much stronger user community. imho if you can get past the initial setup issues as some seems to have faced (not we) then I would say Lucene / Solr is your best bet.

Share:
56,684

Related videos on Youtube

knorv
Author by

knorv

Updated on September 13, 2020

Comments

  • knorv
    knorv over 3 years

    I'm looking for a stand-alone full-text search server with the following properties:

    • Must operate as a stand-alone server that can serve search requests from multiple clients
    • Must be able to do "bulk indexing" by indexing the result of an SQL query: say "SELECT id, text_to_index FROM documents;"
    • Must be free software and must run on Linux with MySQL as the database
    • Must be fast (rules out MySQL's internal full-text search)

    The alternatives I've found that have these properties are:

    • Solr (based on Lucene)
    • ElasticSearch (also based on Lucene)
    • Sphinx

    My questions:

    • How do they compare?
    • Have I missed any alternatives?
    • I know that each use case is different, but are there certain cases where I would definitely not want to use a certain package?
    • Dave
      Dave almost 15 years
      Have you ruled out using straight Lucene? Solr is a service on top of lucene, so straight Lucene could stile be a possibility.
    • knorv
      knorv almost 15 years
      Does Lucene have a stand-alone server mode? I thought that was one of the things SOLR added? I haven't ruled out anything - so feel free to advocate Lucene if that is the best choice given the requirements :-)
    • knorv
      knorv almost 15 years
      mausch: Mainly Java but also other languages.
    • pchap10k
      pchap10k almost 15 years
      Personally I like Sphinx. However, during a "large" project recently, the latest release candidate (0.9.9-rc2) had show stopper bugs when using multi-value arrays (MVA). It would random results! So we moved to SOLR as to get around this. Once SOLR was up and running the performance was fine, and without the show stopper bug.
    • FYA
      FYA about 13 years
      Have you looked at elasticsearch.com ?
    • FYA
      FYA over 12 years
      that said, we're piping data thru xml to sphinxsearch. rough and ugly but once done it is so freakin fast.
  • Mauricio Scheffer
    Mauricio Scheffer almost 15 years
    Geographical searching can be done in Solr with the LocalSolr plugin: gissearch.com/localsolr
  • Mauricio Scheffer
    Mauricio Scheffer almost 15 years
    Solr has many response writers other than xml, including JSON, PHP, Ruby, Python and a java binary format: lucene.apache.org/solr/api/org/apache/solr/request/…
  • larf311
    larf311 almost 15 years
    Did I mention how terrible the Solr/Lucene documentation is? Having to root through Javadocs to figure out functionality is not my idea of documentation.
  • Mauricio Scheffer
    Mauricio Scheffer almost 15 years
    I should have linked to the wiki: wiki.apache.org/solr/…
  • lkahtz
    lkahtz over 13 years
    I spend the whole day fixing some installation bug of sphinx 0.9.9 on my mac. So far it is still not working. It is so buggy. I used very ways suggested. I am givin up Really frustrating...
  • jimmystormig
    jimmystormig over 13 years
    Talking about devs committing to both Solr and Lucene, it seems they have merged the two products making further development easier and faster - lucidimagination.com/blog/2010/03/26/….
  • mlissner
    mlissner about 13 years
    User community is an important point. There are a couple of VERY, VERY helpful people in the Sphinx forums, but there isn't a strong community otherwise.
  • Mauricio Scheffer
    Mauricio Scheffer over 12 years
    @Stann : how so? I've used Solr for nearly 5 years ago and never needed to write a single line of Java.
  • Stann
    Stann over 12 years
    @MauricioScheffer Do u really think that java code will be faster than C++. Here's the comparison made by Bill Karwin and Sphinx there queries things 10 times faster than lucene (and solr have gotta be even slower than.) slideshare.net/billkarwin/…
  • Mauricio Scheffer
    Mauricio Scheffer over 12 years
    @Stann : do you really think you need more performance than whitehouse.gov, Netflix, The Guardian, digg, just to name a few websites using Solr? wiki.apache.org/solr/PublicServers
  • Mauricio Scheffer
    Mauricio Scheffer over 12 years
    @Stann : I also recommend checking out google.com/search?q=java+slow+myth
  • Mauricio Scheffer
    Mauricio Scheffer over 12 years
    @Stann : also, Solr can actually be faster than Lucene due to caching, in real-world scenarios (not contrived benchmarks like Karwin's...)
  • Tyler Liu
    Tyler Liu over 12 years
    solr's documentation is not so good as sphinx. but the community is large. And I can always figure out everything by reading the source code of solr.
  • New Alexandria
    New Alexandria over 11 years
    Here is an answer on Sphinx that is a good pair to this answer on Solr
  • Augiwan
    Augiwan over 11 years
    that awkward moment when I read this answer after a year and a half and click on upvote and see that I wrote this answer myself. lol. :D A small addition to this though: After 18 months, elasticsearch has turned out to be a great alternative and has a decent community too. Cool, bonsai cool!
  • Mevin Babu
    Mevin Babu about 10 years
    Augustus! That awkward moment :D. So for a python web-app what do you think is best now ? Solr or elastic search based on performance, memory usage and easiness to setup any idea ?
  • Augiwan
    Augiwan about 10 years
    It doesn't matter what language the web app is written in. Choose based on your use case!
  • FastAl
    FastAl almost 7 years
    you can INDEX 1.5 million documents in a minute? I can't even come close to READING that many - directly from 7zip (not writing, outputting to the console) files on my SSD! And it's 2017! What kind of documents are these? That's pretty incredible. Note: I hope you didn't mean search the index of 1.5 million in a minute. Searches of an index w/ 1.5 million docs should still return in seconds (even in 2009).