Finding exact match using Lucene search API

10,812

Solution 1

You can use KeywordAnalyzer to index and search on this field. Keyword Analyzer will generate only one token for the entire string.

Solution 2

I googled a lot with no help for the same problem. After scratching my head for a while I found the solution. Search the string within double quotes, that will solve your problem.

National Bancorp will return both #1 and #2 but "National Bancorp" will return only #2.

Solution 3

This is something that may warrant the use of the shingle filter. This filter groups multiple words together. For example, Abigail Adams National Bancorp with a ShingleFilter of 3 tokens would produce (assuming a simple WhitespaceAnalyzer) [Abigail], [Abigail Adams], [Abigail Adams National], [Adams National Bancorp], [Adams National], [Adams], [National], [National Bancorp] and [Bancorp].

If a user the queries for National Bancorp, you will get an exact match on National Bancorp itself, and a lower scored exact match on Abigail Adams National Bancorp (lower scored because this one has much more tokens in the field, thus lowering the idf). I think it makes sense to return both documents on such a query.

You may want to apply the shingle filter at query time as well, depending on the use case.

Share:
10,812
Steve Chapman
Author by

Steve Chapman

Updated on June 08, 2022

Comments

  • Steve Chapman
    Steve Chapman almost 2 years

    I'm working on a company search API using Lucene. My Lucene company index has got 2 companies: 1.Abigail Adams National Bancorp, Inc. 2.National Bancorp

    If the user types in National Bancorp, then only company # 2(ie. National Bancorp) should be returned and not #1.....ie. only exact matches should be returned. How do I achieve this functionality?

    Thanks for reading.

  • Steve Chapman
    Steve Chapman almost 15 years
    Can you please answer this one? stackoverflow.com/questions/899542/…