Searching a single MySQL text column with fuzzy matching

10,602

Solution 1

Full Text Search (FTS) is the terminology for the database functionality you desire. There's:

Solution 2

Here is a SO question that comes very close to what you want. While the answer is for PHP and MySQL, the general principle still applies:

How do I do a fuzzy match of company names in MYSQL with PHP for auto-complete?

Basically you would use SOUNDEX to get you what you want. If you need more power, longer strings, etc. you might want to look into Double Metaphone, which is an improvement over Metaphone and SOUNDEX:

http://aspell.net/metaphone/

http://www.atomodo.com/code/double-metaphone

Share:
10,602
pwaring
Author by

pwaring

Updated on June 09, 2022

Comments

  • pwaring
    pwaring almost 2 years

    I have a MySQL InnoDB table with a 'name' column (VARCHAR(255)) which I want users to be able to search against, returning all the matching rows. However, I can't just use a LIKE query because the search needs to allow for users typing in names which are similar to the available names (e.g. prefixing with 'The', or not knowing that the correct name includes an apostrophe).

    Two examples are:

    Name in DB: 'Rose and Crown'

    Example possible searches which should match: 'Rose & Crown', 'Rose and Crown', 'rose and crown', 'The Rose and Crown'

    Name in DB: 'Diver's Inn'

    Example possible searches which should match: 'Divers' Inn', 'The Diver's Inn', 'Divers Inn'

    I also want to be able to rank the results by a 'closest match' relevance, although I'm not sure how that would be done (edit distance perhaps?).

    It's unlikely that the table will ever grow beyond a few thousand rows, so a method which doesn't scale to millions of rows is fine. Once entered, the name value for a given row will not change, so if an expensive indexing operation is required that's not a problem.

    Is there an existing tool which will perform this task? I've looked at Zend_Search_Lucence but that seems to focus on documents, whereas I'm only interesting in searching a single column.

    Edit: On SOUNDEX searching, this doesn't produce the results I want. For example:

    SELECT soundex( 'the rose & crown' ) AS soundex1, soundex( 'rose and crown' ) AS soundex2;
    soundex1    soundex2
    T6265   R253265
    

    Solution: In the end I've used Zend_Search_Lucence and just pretended that every name is in fact a document, which seems to achieve the result I want. I guess it's full text search in a way, even though each string is at most 3-4 words.

  • pwaring
    pwaring almost 13 years
    The drawbacks of SOUNDEX seem a bit too great for me - especially the first letter being the same ('The Rose and Crown' and 'Rose & Crown' don't have the same first letter).
  • pwaring
    pwaring almost 13 years
    Native MySQL support won't work - as I said in the question my tables are InnoDB. Also, the user won't specify their query as 'Rose', 'Crown', it will be 'Rose & Crown' (for example).
  • OMG Ponies
    OMG Ponies almost 13 years
    @pwaring: That's why I mentioned 3rd party support. Knowing the common terminology should make finding more information easier.
  • ypercubeᵀᴹ
    ypercubeᵀᴹ almost 13 years
    @pwaring: You may bypass that by first stripping your strings from small common words like a, and, the, and also apostrophes, quotes, commas, etc. And then use Soundex.
  • pwaring
    pwaring almost 13 years
    I could do, but that requires writing code to strip common words, punctuation etc. whereas I really want to say "here's the user's query, search against this column and return the results ordered by relevance". If I have to strip things from the query you can guarantee I'll miss something. :)
  • OMG Ponies
    OMG Ponies almost 13 years
    which happens automatically (and is configurable, to a degree) when using Full Text Search (FTS) functionality.