Where can I learn more about the Google search "did you mean" algorithm?

18,514

Solution 1

You should check out Peter Norvigs article about implementing the spell checker in a few lines of python: How to Write a Spelling Corrector It also has links for implementations in other languages (i.e. C#)

Solution 2

I attended a seminar by a Google engineer a year and a half ago, where they talked about their approach to this. The presenter was saying that (at least part of) their algorithm has little intelligence at all; but rather, utilises the huge amounts of data they have access to. They determined that if someone searches for "Brittany Speares", clicks on nothing, and then does another search for "Britney Spears", and clicks on something, we can have a fair guess about what they were searching for, and can suggest that in future.

Disclaimer: This may have just been part of their algorithm

Solution 3

Python has a module called difflib. It provides a functionality called get_close_matches. From the Python Documentation:

get_close_matches(word, possibilities[, n][, cutoff])

Return a list of the best "good enough" matches. word is a sequence for which close matches are desired (typically a string), and possibilities is a list of sequences against which to match word (typically a list of strings).

Optional argument n (default 3) is the maximum number of close matches to return; n must be greater than 0.

Optional argument cutoff (default 0.6) is a float in the range [0, 1]. Possibilities that don't score at least that similar to word are ignored.

The best (no more than n) matches among the possibilities are returned in a list, sorted by similarity score, most similar first.

  >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
  ['apple', 'ape']
  >>> import keyword
  >>> get_close_matches('wheel', keyword.kwlist)
  ['while']
  >>> get_close_matches('apple', keyword.kwlist)
  []
  >>> get_close_matches('accept', keyword.kwlist)
  ['except']

Could this library help you?

Solution 4

You can use http://developer.yahoo.com/search/web/V1/spellingSuggestion.html which would give a similar functionality.

Solution 5

I am not sure if it serves your purpose but a String Edit distance Algorithm with a dictionary might suffice for a small Application.

Share:
18,514
vidhi
Author by

vidhi

Updated on July 12, 2022

Comments

  • vidhi
    vidhi almost 2 years

    Possible Duplicate:
    How do you implement a “Did you mean”?

    I am writing an application where I require functionality similar to Google's "did you mean?" feature used by their search engine:

    alt text

    Is there source code available for such a thing or where can I find articles that would help me to build my own?

  • ebneter
    ebneter over 13 years
    No, it guesses alternatives based on misspellings. If you search for "katie sachoff" it comes up with "Did you mean katee sackhoff?"
  • JAL
    JAL over 13 years
    I recently read an article in which a Google employee expounded upon how they have the world's most advanced spellchecker, since it will take the context of a word into account in ways that few others do.
  • Dominic K
    Dominic K over 13 years
    @Alex JL- And they're probably right.
  • Aziz Sharipov
    Aziz Sharipov over 13 years
    Yes, I think it learns from what other people had corrected certain searches to. For instance if you search for 'hunrgy man dinner' and then click on nothing and change it to 'hungry man dinner', Google takes note of that next time it gets the first search. I'm sure they have more tricks than that, too, such as a traditional spellcheck in there somewhere.
  • Colin Hebert
    Colin Hebert over 13 years
    @Alex JL, But it still doesn't check anything. Proposing an alternative with a common mistake, or a really similar but more common word isn't a spelling correction it's just guessing.
  • Gumbo
    Gumbo over 13 years
    Side fact: Peter Norvig is Director of Research at Google.
  • JAL
    JAL over 13 years
    @Colin Not sure what you mean - isn't that what every spell checker does? Detect a mispelled word, and use heuristics to guess what you mean instead? I mean, I misspelled 'misspelled' and Firefox is suggesting misspelled, dispelled, respelled, etc. It's not like they're artificial intelligence or something. I agree with Google that theirs works very well.
  • Colin Hebert
    Colin Hebert over 13 years
    @Alex JL, I mean that if "everybody" write a word the wrong way, google will find the wrong word more important that the good one. It's isn't based on spelling rules but on most common use.
  • Colin Hebert
    Colin Hebert over 13 years
    @Alex JL, for example (in french) the word "Obtue" is a common mistake, the correct spelling is "Obtuse", but as the mistake is really common, Google won't say anything about this word. Or in english if you search for "alterior" instead of "ulterior" it's considered as okay because it's used frequently.
  • ibz
    ibz over 13 years
    This answer should be marked as accepted. Norvig's algorithm solves OP's problem, is pretty awesome, and it comes from Google. :)
  • Admin
    Admin over 13 years
    RE Disclaimer: I assume it was/is. It's a very safe way to go about it. I couldn't imagine anybody coming up with an algorithm that searches a database full of english words, then trying to determine whether or not the query is similar to existing data.
  • Martín Schonaker
    Martín Schonaker over 13 years
    An N-Gram index is the only sound solution I've seen among the answers, why is this tumbled down? Well... aside of Peter Norvig's. But N-Grams can do it quite good.
  • hugo24
    hugo24 over 13 years
    Thank u :) N-Grams are the prefered way at google... as far as i know.