MySQL Fulltext Search Score Explained

22,838

Solution 1

Generally relevance is based on how many matches each row has to the words given to the search. The exact value will depend on many things, but it really only matters for comparing to other relevance values in the same query.

If you really want the math behind it, you can find it at the internals manual.

Solution 2

Take the query "word1 word2" as an example.

BOOLEAN mode indicates that your entire query matches the document (e.g. it contains both word1 AND word2). Boolean mode is a strict match.

The formula normally used is based on the Vector Space Model of searching. Very simplified, it figures out two measures to determine how important a word is to a query. The term frequency (terms that occur often in a document are more important than other terms) and the inverse document frequency (a term that occurs in many documents is weighted lower than a term that occurs in few documents). This is known as tf-idf, and is used as a basis for the vector space model. These scores form the basis for the Vector Space Model, which someone else can explain thoroughly. :)

Share:
22,838
Eric Lamb
Author by

Eric Lamb

I live in Los Angeles and am a freelance programmer.

Updated on July 16, 2022

Comments

  • Eric Lamb
    Eric Lamb almost 2 years

    I've been experimenting with fulltext search lately and am curious about the meaning of the Score value. For example I have the following query:

    SELECT table. * ,
    MATCH (
    col1, col2, col3
    )
    AGAINST (
    '+(Term1) +(Term1)'
    ) AS Score
    FROM table
    WHERE MATCH (
    col1, col2, col3
    ) 
    AGAINST (
    '+(Term1) +(Term1)'
    )
    

    In the results for Score I've seen results, for one query, between 0.4667041301727 to 11.166275978088. I get that it's MySQLs idea of relevance (the higher the more weight).

    What I don't get is how MySQL comes up with that score. Why is the number not returned as a decimal or something besides ?

    How come if I run a query "IN BOOLEAN MODE" does the score always return a 1 or a 0 ? Wouldn't all the results be a 1?

    Just hoping for some enlightenment. Thanks.

  • se_pavel
    se_pavel almost 15 years
    May I display to client the value 11.166275978088 as "relevance 11%"?
  • johnnietheblack
    johnnietheblack over 14 years
    that would be a bad idea...its not accurate that way...no
  • Ihsan
    Ihsan over 4 years
    @se_pavel rather I think what you could do instead is get the sum of the score, divide it by 11.1662xx.. and multiply it by 100. If my math is not haywire, you should be able to get the relevance percentage easily. Example: 11/159.399*100 = 6.90092158671%