How to make a multiple column mysql fulltext search where partial words are matched

11,518

Solution 1

In boolean mode, requiring strings to be present (instead of just scoring higher), is done with +. prefix matching is done with an ending *. This seems to be what you want, so search for:

+John* +S*
+John* +Smith*
+Smith* +J*
+Jo* +S*

Note that Full Text indexes cannot help you searching 'anywhere in a word'. so something like *mith* is bound to fail: they're meant to match from character 1 in an index.

If you also want to order them by match values, and for instance, need John Smith before Johnny Smithson, you'd do this:

 SELECT * FROM user 
 WHERE MATCH(..fields..) AGAINST ('match' IN BOOLEAN MODE)
 ORDER BY MATCH(..fields..) AGAINST ('match' IN BOOLEAN MODE) DESC;

Which you will see will get you nowhere unless you add all the words >= ft_min_word_len again separately:

+John* +S* John
+John* +Smith* John Smith
+Smith* +J* Smith
+Jo* +S*

For the last one, both are < the default 4 characters, so we can't add sorting params for that in default mysql, but you could set ft_min_world_len differently is desired.

Solution 2

IN BOOLEAN MODE you can use the +-modifier to force AND or the --modifier to force NOT. No operator, your case, mean optional.

And you need to check the minimal word length in your mysql configuration to make the FULLTEXT INDEX index words smaller than a certain length.

I had to set

ft_min_word_len = 2

in my.cnf and had to rebuild the index to make this effective. By default it is 3.

To find out your min_word_len check (and upvote) this question

Solution 3

See http://dev.mysql.com/doc/refman/5.5/en//fulltext-boolean.html

You may place a "+", "-", or no operator before a word to make it search for "AND contains this word", "NOT contains this word", and no operator is "OR contains this word"

If I type in "John S", only the first result shows which is the desired behavior.

There's only one John, so this works, S is below the minimum word length and is discarded

If I type in "John Smith", only the first result shows which is the desired behavior.

There's only one John so this works

If I type "Smith J", both results show even though Bob is not a match.

J is below the minimum word length, so its only matching smith which is both rows

If I type "Smith John", both results show even though Bob is not a match.

Since you're in BOOLEAN MODE MySQL interprets this as Smith OR John... Smith matches both.

Lastly, if I type "Jo S", no results are returned despite the partial match on "Jo" and "S".

Jo and S are below the minimum word length - I believe MySQL treats this as searching for nothing

You'll want to add a "+" before your search parameters to turn them into an AND search... +Smith +John

Share:
11,518
Max
Author by

Max

Updated on July 19, 2022

Comments

  • Max
    Max almost 2 years

    I currently have a single search field searching against multiple columns using this code:

    $searchArray = explode(" ", $searchVal);
    $query="SELECT * FROM users WHERE ";
    $i=0;
    foreach ($searchArray as $word) {
        if ($i != 0) $query .= " OR ";
        $query .= " MATCH (`first_name`, `last_name`, `email`) AGAINST ('".$word."*'  IN BOOLEAN MODE)";
        $i++;
    }
    

    Lets say I have these two rows in the table:

    id | last_name | first_name | email
    1  | Smith     | John       | [email protected]
    2  | Smith     | Bob        | [email protected]
    

    If I type in "John S", only the first result shows which is the desired behavior.

    If I type in "John Smith", only the first result shows which is the desired behavior.

    If I type "Smith J", both results show even though Bob is not a match.

    If I type "Smith John", both results show even though Bob is not a match.

    Lastly, if I type "Jo S", no results are returned despite the partial match on "Jo" and "S".

    Can anyone help me fix my query to deal with the desired functionality of the order not being important and partial results matching? If it can be sorted by the best matches (i.e. the longest part of the word, starting from the first letter only not a section in the middle, in the highest number of columns), that would be a huge help also.

    UPDATE:

    Just wanted to post the final code that worked based on the solution. My loop creating multiple match statements was incorrect as was my ft_min_word_len.

    My code is now:

    $searchArray = explode(" ", $searchVal);
    $query="SELECT * FROM users WHERE  MATCH (`first_name`, `last_name`, `email`) AGAINST ('";
    $i=0;
    foreach ($searchArray as $word) {
        $query .= "+".$word."* ";
    }
    $query .= "' IN BOOLEAN MODE)";
    
  • Patrick B.
    Patrick B. about 11 years
    The field ft_min_word_len is not for the MATCH-query, it is for the index which has been created and the query is now matching to. So, someone called Jo Smith would not match a +Jo* match.
  • Wrikken
    Wrikken about 11 years
    Already deleted my first comment (as it was false, boolean mode does use min word length, my apologies for that).
  • Max
    Max about 11 years
    There still seems to be an issue with the order of the terms though. I added Pablo Picasso to the DB to get a longer name for testing. The term "Pablo Pica" returns a result. "Pica Pablo" does not. Should I be passing the full set of terms with a + and a * into a single MATCH statement or doing multiple MATCH statements with an OR as I have above?
  • Max
    Max about 11 years
    There still seems to be an issue with the order of the terms. I added Pablo Picasso to the DB. The term "Pablo Pica" returns a result. "Pica Pablo" does not. Should I be passing the full set of terms with a + and a * into a single MATCH statement or doing multiple MATCH statements with an OR as I have above?
  • Wrikken
    Wrikken about 11 years
    @Max: I cannot reproduce this, both Pica Pablo as Pablo Pica (or +Pica* +Pablo* / +Pablo* +Pica*) return the same user for me. Yes, in a single MATCH() AGAINST() statement. Every single term in the MATCH() would most likely require the +term* format. Do read @PatrickB's comment though: names < 4 characters will never match.
  • Max
    Max about 11 years
    Thanks! The combination of the ft_min_word_len and changing it to a single MATCH statement fixed it. I will update the question with my final code.
  • Alex Lacayo
    Alex Lacayo almost 9 years
    So i have a users table with a name column. I have a row with the name "James DeQuan" and ... SELECT * FROM users` WHERE MATCH(name) AGAINST('+james* +de*' in BOOLEAN MODE) returns nothing. Works fine for all other entries without an apostrophe. Any clue why?