Using Python, find anagrams for a list of words
Solution 1
In order to do this for 2 strings you can do this:
def isAnagram(str1, str2):
str1_list = list(str1)
str1_list.sort()
str2_list = list(str2)
str2_list.sort()
return (str1_list == str2_list)
As for the iteration on the list, it is pretty straight forward
Solution 2
Create a dictionary of (sorted word, list of word). All the words that are in the same list are anagrams of each other.
from collections import defaultdict
def load_words(filename='/usr/share/dict/american-english'):
with open(filename) as f:
for word in f:
yield word.rstrip()
def get_anagrams(source):
d = defaultdict(list)
for word in source:
key = "".join(sorted(word))
d[key].append(word)
return d
def print_anagrams(word_source):
d = get_anagrams(word_source)
for key, anagrams in d.iteritems():
if len(anagrams) > 1:
print(key, anagrams)
word_source = load_words()
print_anagrams(word_source)
Or:
word_source = ["car", "tree", "boy", "girl", "arc"]
print_anagrams(word_source)
Solution 3
One solution is to sort the word you're searching anagrams for (for example using sorted
), sort the alternative and compare those.
So if you would be searching for anagrams of 'rac' in the list ['car', 'girl', 'tofu', 'rca']
, your code could look like this:
word = sorted('rac')
alternatives = ['car', 'girl', 'tofu', 'rca']
for alt in alternatives:
if word == sorted(alt):
print alt
Solution 4
There are multiple solutions to this problem:
-
Classic approach
First, let's consider what defines an anagram: two words are anagrams of each other if they consist of the same set of letters and each letter appears exactly the same number or time in both words. This is basically a histogram of letters count of each word. This is a perfect use case for
collections.Counter
data structure (see docs). The algorithms is as follows:- Build a dictionary where keys would be histograms and values would be lists of words that have this histogram.
- For each word build it's histogram and add it to the list that corresponds to this histogram.
- Output list of dictionary values.
Here is the code:
from collections import Counter, defaultdict def anagram(words): anagrams = defaultdict(list) for word in words: histogram = tuple(Counter(word).items()) # build a hashable histogram anagrams[histogram].append(word) return list(anagrams.values()) keywords = ("hi", "hello", "bye", "helol", "abc", "cab", "bac", "silenced", "licensed", "declines") print(anagram(keywords))
Note that constructing
Counter
isO(l)
, while sorting each word isO(n*log(l))
where l is the length of the word. -
Solving anagrams using prime numbers
This is a more advanced solution, that relies on the "multiplicative uniqueness" of prime numbers. You can refer to this SO post: Comparing anagrams using prime numbers, and here is a sample python implementation.
Solution 5
Since you can't import anything, here are two different approaches including the for loop you asked for.
Approach 1: For Loops and Inbuilt Sorted Function
word_list = ["percussion", "supersonic", "car", "tree", "boy", "girl", "arc"]
# initialize a list
anagram_list = []
for word_1 in word_list:
for word_2 in word_list:
if word_1 != word_2 and (sorted(word_1)==sorted(word_2)):
anagram_list.append(word_1)
print(anagram_list)
Approach 2: Dictionaries
def freq(word):
freq_dict = {}
for char in word:
freq_dict[char] = freq_dict.get(char, 0) + 1
return freq_dict
# initialize a list
anagram_list = []
for word_1 in word_list:
for word_2 in word_list:
if word_1 != word_2 and (freq(word_1) == freq(word_2)):
anagram_list.append(word_1)
print(anagram_list)
If you want these approaches explained in more detail, here is an article.
Related videos on Youtube
user1040563
Updated on March 25, 2022Comments
-
user1040563 over 2 years
If I have a list of strings for example:
["car", "tree", "boy", "girl", "arc"...]
What should I do in order to find anagrams in that list? For example
(car, arc)
. I tried using for loop for each string and I usedif
in order to ignore strings in different lengths but I can't get the right result.How can I go over each letter in the string and compare it to others in the list in different order?
I have read several similar questions, but the answers were too advanced. I can't import anything and I can only use basic functions.
-
M S over 12 yearsbefore comparing, try sorting each string.
-
Trufa over 12 yearsI don't really understand what you want to do if you find an anagram? Do you want another list, only with the one that has anagrams? is one anagram enough? Or do you want to know if one word has an anagram in that list?
-
hughdbrown over 12 yearsAre you trying to find all tthe combinations that you can make with a set of letters or the actual anagrams? For the former, look at itertools.combinations(). For the latter, try my code.
-
Matthew over 10 yearsDon't you mean a palindrome?
-
John White almost 4 yearsHugh, where is your code?...?
-
-
user1040563 over 12 yearsThank you, never thought about sorting its much easier this way
-
Admin over 12 yearsThe body of this function can be reduced to
return sorted(str1) == sorted(str2)
-
Kracekumar over 12 yearsI haven't used reversed() because it yields generator.
-
Rajeev about 12 yearsI was going through this post and this was very easy and cleanest way of finding an anagram
-
Teepeemm almost 11 yearsThis looks for palindromes, not anagrams.
-
Dennis over 10 yearsIn addition to
return sorted(str1) == sorted(str2)
being less code to type, it should also be slightly more efficient. But sorting is still O(n lg n), you can find anagrams in O(n) with a dictionary. -
starcodex over 9 years@Dennis Yea but it depends on the constraints. A dictionary would also use O(n) space. There are almost always tradeoffs to be made.
-
gmo over 9 yearsVery (very!) old question... also, you're giving a
java
solution for apython
question... Recommended reading -> stackoverflow.com/help/how-to-answer -
Anil over 7 years
def anagrams(word, words): return filter(lambda x: sorted(word) == sorted(x), words)
-
Asclepius over 7 yearsThe
defaultdict
instance must typically be frozen after it has been fully populated. This can be done by setting itsdefault_factory = None
. Failing this, the dict will be updated by mere lookups - this is not what most users want. -
Asclepius over 7 yearsThe answer is totally wrong and abominable. Consider:
ord('a') + ord('d') == ord('b') + ord('c')
-
Asclepius over 7 yearsUsing
vector_anagram
doesn't work, if only because 26 is not sufficient to represent non-[a-z] characters, and there are too large a number of them. The solution must most definitely be generic enough to work with non-[a-z] characters as well. -
Asclepius over 7 yearsAny solution that works only for the letters a-z is a bad solution because it is not sufficiently generic at all. There are many other characters that exist besides a-z, and they must most definitely be supported.
-
Admin almost 7 yearsThe solution is incomplete. Because you are not considering phrases. This link: en.wikipedia.org/wiki/Anagram puts this point clear. E.G : "ANAGRAM" could be transform in "NAG A RAM" with the code provided you wont it. You need to : 1) Remove spaces 2) Check that Str1 !== Str2
-
Admin almost 7 yearsdef isAnagram(str1, str2): if (str1 == str2): return false str1_list = list(str1) str1_list.sort() str2_list = list(str2) str2_list.sort() str1_list(filter(lambda a: a != ' ',str1_list)) str2_list(filter(lambda a: a != ' ',str2_list)) return (str1_list == str2_list)
-
Daniel Gibson almost 7 yearsIt wouldn't be hard to expand the function to include all ASCII characters. The array would need a length of 256 and subtraction of 97 would be dropped. Beyond that (ie: Unicode), I wonder what point running that vector through the hash function becomes too slow.
-
Dov Benyomin Sohacheski over 6 yearsYou should include an explanation as to why this solves the issue
-
Brad Solomon about 6 yearsThis will fail for
['bat', 'bat', 'rats', 'god', 'dog', 'cat', 'arts', 'star', 'rats']
, returningbat
as part of the result. -
Daniel Gibson almost 6 yearsThis is known as the texas sharpshooter fallacy. Anyway, if we're playing by your rules, why not input a set?
-
Harshith Thota about 5 yearsThis is the best solution. I've tested it on a file containing 400000+ words and the Counter method is computed almost instantly while the classic sorting method took me a heck load of time.
-
C. Fennell about 4 yearsThis has answer has already been posted by other users in shorter, more efficient ways. What does your answer add?
-
Nidhi Donuru about 4 yearsI wanted to show a code which doesn't need to import anything
-
Marco Castanho about 4 yearsThis didn't work for me because tuple is ordered, so the histograms take order into account. E.g. 'abc' has an histogram of ((a: 1), (b:1), (c:1)) and 'cab' has an histogram of ((c:1), (a:1), (b:1)). Because the histogram is a tuple, and tuple is ordered, the histograms are different and the logic doesn't consider 'abc' and 'cab' as anagrams. Replacing tuple with frozenset made it work for me.