Best way to do a weighted search over multiple fields in mysql?

11,256

Solution 1

Probably this approach of doing a weighted search / results is suitable for you:

SELECT *,
    IF(
            `name` LIKE "searchterm%",  20, 
         IF(`name` LIKE "%searchterm%", 10, 0)
      )
      + IF(`description` LIKE "%searchterm%", 5,  0)
      + IF(`url`         LIKE "%searchterm%", 1,  0)
    AS `weight`
FROM `myTable`
WHERE (
    `name` LIKE "%searchterm%" 
    OR `description` LIKE "%searchterm%"
    OR `url`         LIKE "%searchterm%"
)
ORDER BY `weight` DESC
LIMIT 20

It uses a select subquery to provide the weight for ordering the results. In this case three fields searched over, you can specify a weight per field. It's probably less expensive than unions and probably one of the faster ways in plain MySQL only.

If you've got more data and need results faster, you can consider using something like Sphinx or Lucene.

Solution 2

you can add multiple mysql MATCH() values together, first multiplying each one by their weight.

simplified of course...

'(MATCH(column1) AGAINST(\''.$_GET['search_string'].'\') * '.$column1_weight.')
 + (MATCH(column2) AGAINST(\''.$_GET['search_string'].'\') * '.$column2_weight.')
 + (MATCH(column3) AGAINST(\''.$_GET['search_string'].'\') * '.$column3_weight.')
 AS relevance'

then

'ORDER BY relevance'

Solution 3

There is a native and clean way to do this using MySQL's CASE function (https://dev.mysql.com/doc/refman/5.7/en/case.html).

Example (untested):

SELECT * FROM `myTable` 
WHERE (`name` LIKE "%searchterm%" OR `description` LIKE %searchterm%" OR `url` LIKE "%searchterm%")
ORDER BY CASE
WHEN `name`        LIKE "searchterm%"  THEN 20
WHEN `name`        LIKE "%searchterm%" THEN 10
WHEN `description` LIKE "%searchterm%" THEN 5
WHEN `url`         LIKE "%searchterm%" THEN 1
ELSE 0
END
LIMIT 20

Have used this for many weighted searches of my own and works an absolute treat!

Solution 4

I had this exact same question and it was fully answered on one of the MySQL forums. Here's the thread. Kind of a long thread (because I'm kind of long-winded) but the payoff is just what you're looking for.

Solution 5

You should use a dedicated indexer to prefetch all of the data into an optimized, searchable index. Sphinx and similar products do this very well.

Share:
11,256

Related videos on Youtube

Hugo Mota
Author by

Hugo Mota

Software engineer, full stack developer, investor, framework bakery.

Updated on June 25, 2020

Comments

  • Hugo Mota
    Hugo Mota almost 4 years

    Here's what i want to do:

    • match a search subject against multiple fields of my table
    • order the results by importance of the field and relevance of the matching (in that order)

    Ex: let's assume I have a blog. Then someone searches for "php". The results would appear that way:

    • first, the matches for the field 'title', ordered by relevance
    • then, the matches for the field 'body', ordered by relevance too
    • and so on with the specified fields...

    I actually did this with a class in PHP but it uses a lot of UNIONS (a lot!) and grows with the size of the search subject. So I'm worried about performance and DOS issues. Does anybody has a clue on this?

  • dqhendricks
    dqhendricks almost 13 years
    if your weights are correctly set, it will place them in the desired order.
  • Hugo Mota
    Hugo Mota almost 13 years
    can you give me an example of how to set the weights for this?
  • dqhendricks
    dqhendricks almost 13 years
    @hugo_leonardo the weights depend on how much more important a match in the title is compared to the body. so if the title match is 5x more important than a body match, the weights would be 5 and 1 respectively. does this make sense?
  • dqhendricks
    dqhendricks almost 13 years
    @hugo_leonardo if there is a severly weak match in the title, and a very strong match in the body, it is still possible that a body matched item could appear before a title matched item. this is a better way to do it however since, not all title match should be more valueable than all body matches, depending on the match scores. also, keep in mind that the fields being matched against need to have a full text index on them.
  • Hugo Mota
    Hugo Mota almost 13 years
    @dqhendricks well, it makes sense! but, not all the fields will be TEXT and only MYISAM tables support text indexes =/
  • Hugo Mota
    Hugo Mota almost 13 years
    i like this approach! can you explain me what's going on in the IFs before the FROM? i'm not used with complex querys =/
  • hakre
    hakre almost 13 years
    Basically it's an IF function, if the condition (first argument) is true, second argument will be used (the weight) otherwise third argument will be used (0-weight). The manual has all the details: dev.mysql.com/doc/refman/5.0/en/…
  • Hugo Mota
    Hugo Mota almost 13 years
    that's very nice! so if it appears in name (not at the start), description and url, it will get an weight of 16? (10+5+1)
  • hakre
    hakre almost 13 years
    Exactly. So you can make an individual weight per field.
  • hakre
    hakre almost 13 years
    +1 - nice answer to make use of that with match() for fulltext indices. - however certain preconditions must be met.
  • Hugo Mota
    Hugo Mota almost 13 years
    thank you! this solves my problem! next step is to make similar searches to match, like "gogle" finding "google". but this is another question. haha
  • Pete Wilson
    Pete Wilson almost 13 years
    @hugo_leonardo -- yes, the two guys who responded gave very thoughtful answers.
  • David Yell
    David Yell over 10 years
    But obviously you'd escape your $_GET variables before using them in SQL.
  • dqhendricks
    dqhendricks over 10 years
    @DavidYell Yes, or better yet, use PDO prepared statements.
  • Stephan L
    Stephan L over 9 years
    For this kind a "similarity" search you ought to use Lucene / SolR.
  • Rooster242
    Rooster242 about 8 years
    Excellent solution! It's simple and it supports multiple search terms and multiple searchable fields. Thank you!
  • jor
    jor over 7 years
    @PeteWilson: please add relavant quotes to your post because external links may break.