Where can I learn more about the Google search "did you mean" algorithm?
Solution 1
You should check out Peter Norvigs article about implementing the spell checker in a few lines of python: How to Write a Spelling Corrector It also has links for implementations in other languages (i.e. C#)
Solution 2
I attended a seminar by a Google engineer a year and a half ago, where they talked about their approach to this. The presenter was saying that (at least part of) their algorithm has little intelligence at all; but rather, utilises the huge amounts of data they have access to. They determined that if someone searches for "Brittany Speares", clicks on nothing, and then does another search for "Britney Spears", and clicks on something, we can have a fair guess about what they were searching for, and can suggest that in future.
Disclaimer: This may have just been part of their algorithm
Solution 3
Python has a module called difflib
. It provides a functionality called get_close_matches
. From the Python Documentation:
get_close_matches(word, possibilities[, n][, cutoff])
Return a list of the best "good enough" matches. word is a sequence for which close matches are desired (typically a string), and possibilities is a list of sequences against which to match word (typically a list of strings).
Optional argument n (default
3
) is the maximum number of close matches to return; n must be greater than0
.Optional argument cutoff (default
0.6
) is a float in the range [0, 1]. Possibilities that don't score at least that similar to word are ignored.The best (no more than n) matches among the possibilities are returned in a list, sorted by similarity score, most similar first.
>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
['apple', 'ape']
>>> import keyword
>>> get_close_matches('wheel', keyword.kwlist)
['while']
>>> get_close_matches('apple', keyword.kwlist)
[]
>>> get_close_matches('accept', keyword.kwlist)
['except']
Could this library help you?
Solution 4
You can use http://developer.yahoo.com/search/web/V1/spellingSuggestion.html which would give a similar functionality.
Solution 5
I am not sure if it serves your purpose but a String Edit distance Algorithm with a dictionary might suffice for a small Application.
vidhi
Updated on July 12, 2022Comments
-
vidhi almost 2 years
Possible Duplicate:
How do you implement a “Did you mean”?I am writing an application where I require functionality similar to Google's "did you mean?" feature used by their search engine:
Is there source code available for such a thing or where can I find articles that would help me to build my own?
-
ebneter over 13 yearsNo, it guesses alternatives based on misspellings. If you search for "katie sachoff" it comes up with "Did you mean katee sackhoff?"
-
JAL over 13 yearsI recently read an article in which a Google employee expounded upon how they have the world's most advanced spellchecker, since it will take the context of a word into account in ways that few others do.
-
Dominic K over 13 years@Alex JL- And they're probably right.
-
Aziz Sharipov over 13 yearsYes, I think it learns from what other people had corrected certain searches to. For instance if you search for 'hunrgy man dinner' and then click on nothing and change it to 'hungry man dinner', Google takes note of that next time it gets the first search. I'm sure they have more tricks than that, too, such as a traditional spellcheck in there somewhere.
-
Colin Hebert over 13 years@Alex JL, But it still doesn't check anything. Proposing an alternative with a common mistake, or a really similar but more common word isn't a spelling correction it's just guessing.
-
Gumbo over 13 yearsSide fact: Peter Norvig is Director of Research at Google.
-
JAL over 13 years@Colin Not sure what you mean - isn't that what every spell checker does? Detect a mispelled word, and use heuristics to guess what you mean instead? I mean, I misspelled 'misspelled' and Firefox is suggesting misspelled, dispelled, respelled, etc. It's not like they're artificial intelligence or something. I agree with Google that theirs works very well.
-
Colin Hebert over 13 years@Alex JL, I mean that if "everybody" write a word the wrong way, google will find the wrong word more important that the good one. It's isn't based on spelling rules but on most common use.
-
Colin Hebert over 13 years@Alex JL, for example (in french) the word "Obtue" is a common mistake, the correct spelling is "Obtuse", but as the mistake is really common, Google won't say anything about this word. Or in english if you search for "alterior" instead of "ulterior" it's considered as okay because it's used frequently.
-
ibz over 13 yearsThis answer should be marked as accepted. Norvig's algorithm solves OP's problem, is pretty awesome, and it comes from Google. :)
-
Admin over 13 yearsRE Disclaimer: I assume it was/is. It's a very safe way to go about it. I couldn't imagine anybody coming up with an algorithm that searches a database full of english words, then trying to determine whether or not the query is similar to existing data.
-
Martín Schonaker over 13 yearsAn N-Gram index is the only sound solution I've seen among the answers, why is this tumbled down? Well... aside of Peter Norvig's. But N-Grams can do it quite good.
-
hugo24 over 13 yearsThank u :) N-Grams are the prefered way at google... as far as i know.