Using Python, find anagrams for a list of words

python anagram

125,373

Solution 1

In order to do this for 2 strings you can do this:

def isAnagram(str1, str2):
    str1_list = list(str1)
    str1_list.sort()
    str2_list = list(str2)
    str2_list.sort()

    return (str1_list == str2_list)

As for the iteration on the list, it is pretty straight forward

Solution 2

Create a dictionary of (sorted word, list of word). All the words that are in the same list are anagrams of each other.

from collections import defaultdict

def load_words(filename='/usr/share/dict/american-english'):
    with open(filename) as f:
        for word in f:
            yield word.rstrip()

def get_anagrams(source):
    d = defaultdict(list)
    for word in source:
        key = "".join(sorted(word))
        d[key].append(word)
    return d

def print_anagrams(word_source):
    d = get_anagrams(word_source)
    for key, anagrams in d.iteritems():
        if len(anagrams) > 1:
            print(key, anagrams)

word_source = load_words()
print_anagrams(word_source)

Or:

word_source = ["car", "tree", "boy", "girl", "arc"]
print_anagrams(word_source)

Solution 3

One solution is to sort the word you're searching anagrams for (for example using sorted), sort the alternative and compare those.

So if you would be searching for anagrams of 'rac' in the list ['car', 'girl', 'tofu', 'rca'], your code could look like this:

word = sorted('rac')
alternatives = ['car', 'girl', 'tofu', 'rca']

for alt in alternatives:
    if word == sorted(alt):
        print alt

Solution 4

There are multiple solutions to this problem:

Classic approach

First, let's consider what defines an anagram: two words are anagrams of each other if they consist of the same set of letters and each letter appears exactly the same number or time in both words. This is basically a histogram of letters count of each word. This is a perfect use case for collections.Counter data structure (see docs). The algorithms is as follows:
- Build a dictionary where keys would be histograms and values would be lists of words that have this histogram.
- For each word build it's histogram and add it to the list that corresponds to this histogram.
- Output list of dictionary values.
Here is the code:
```
from collections import Counter, defaultdict

def anagram(words):
    anagrams = defaultdict(list)
    for word in words:
        histogram = tuple(Counter(word).items()) # build a hashable histogram
        anagrams[histogram].append(word)
    return list(anagrams.values())

keywords = ("hi", "hello", "bye", "helol", "abc", "cab", 
                "bac", "silenced", "licensed", "declines")

print(anagram(keywords))
```
Note that constructing Counter is O(l), while sorting each word is O(n*log(l)) where l is the length of the word.
Solving anagrams using prime numbers

This is a more advanced solution, that relies on the "multiplicative uniqueness" of prime numbers. You can refer to this SO post: Comparing anagrams using prime numbers, and here is a sample python implementation.

Solution 5

Since you can't import anything, here are two different approaches including the for loop you asked for.

Approach 1: For Loops and Inbuilt Sorted Function

word_list = ["percussion", "supersonic", "car", "tree", "boy", "girl", "arc"]

# initialize a list
anagram_list = []
for word_1 in word_list: 
    for word_2 in word_list: 
        if word_1 != word_2 and (sorted(word_1)==sorted(word_2)):
            anagram_list.append(word_1)
print(anagram_list)

Approach 2: Dictionaries

def freq(word):
    freq_dict = {}
    for char in word:
        freq_dict[char] = freq_dict.get(char, 0) + 1
    return freq_dict

# initialize a list
anagram_list = []
for word_1 in word_list: 
    for word_2 in word_list: 
        if word_1 != word_2 and (freq(word_1) == freq(word_2)):
            anagram_list.append(word_1)
print(anagram_list)

If you want these approaches explained in more detail, here is an article.

View more solutions

125,373

user1040563

Updated on March 25, 2022

Comments

user1040563 over 2 years
If I have a list of strings for example:
```
["car", "tree", "boy", "girl", "arc"...]
```
What should I do in order to find anagrams in that list? For example (car, arc). I tried using for loop for each string and I used if in order to ignore strings in different lengths but I can't get the right result.

How can I go over each letter in the string and compare it to others in the list in different order?

I have read several similar questions, but the answers were too advanced. I can't import anything and I can only use basic functions.
- M S over 12 years
  
  before comparing, try sorting each string.
- Trufa over 12 years
  
  I don't really understand what you want to do if you find an anagram? Do you want another list, only with the one that has anagrams? is one anagram enough? Or do you want to know if one word has an anagram in that list?
- hughdbrown over 12 years
  
  Are you trying to find all tthe combinations that you can make with a set of letters or the actual anagrams? For the former, look at itertools.combinations(). For the latter, try my code.
- Matthew over 10 years
  
  Don't you mean a palindrome?
- John White almost 4 years
  
  Hugh, where is your code?...?
user1040563 over 12 years

Thank you, never thought about sorting its much easier this way
Admin over 12 years

The body of this function can be reduced to return sorted(str1) == sorted(str2)
Kracekumar over 12 years

I haven't used reversed() because it yields generator.
Rajeev about 12 years

I was going through this post and this was very easy and cleanest way of finding an anagram
Teepeemm almost 11 years

This looks for palindromes, not anagrams.
Dennis over 10 years

In addition to return sorted(str1) == sorted(str2) being less code to type, it should also be slightly more efficient. But sorting is still O(n lg n), you can find anagrams in O(n) with a dictionary.
starcodex over 9 years

@Dennis Yea but it depends on the constraints. A dictionary would also use O(n) space. There are almost always tradeoffs to be made.
gmo over 9 years

Very (very!) old question... also, you're giving a java solution for a python question... Recommended reading -> stackoverflow.com/help/how-to-answer
Anil over 7 years

def anagrams(word, words): return filter(lambda x: sorted(word) == sorted(x), words)
Asclepius over 7 years

The defaultdict instance must typically be frozen after it has been fully populated. This can be done by setting its default_factory = None. Failing this, the dict will be updated by mere lookups - this is not what most users want.
Asclepius over 7 years

The answer is totally wrong and abominable. Consider: ord('a') + ord('d') == ord('b') + ord('c')
Asclepius over 7 years

Using vector_anagram doesn't work, if only because 26 is not sufficient to represent non-[a-z] characters, and there are too large a number of them. The solution must most definitely be generic enough to work with non-[a-z] characters as well.
Asclepius over 7 years

Any solution that works only for the letters a-z is a bad solution because it is not sufficiently generic at all. There are many other characters that exist besides a-z, and they must most definitely be supported.
Admin almost 7 years

The solution is incomplete. Because you are not considering phrases. This link: en.wikipedia.org/wiki/Anagram puts this point clear. E.G : "ANAGRAM" could be transform in "NAG A RAM" with the code provided you wont it. You need to : 1) Remove spaces 2) Check that Str1 !== Str2
Admin almost 7 years

def isAnagram(str1, str2): if (str1 == str2): return false str1_list = list(str1) str1_list.sort() str2_list = list(str2) str2_list.sort() str1_list(filter(lambda a: a != ' ',str1_list)) str2_list(filter(lambda a: a != ' ',str2_list)) return (str1_list == str2_list)
Daniel Gibson almost 7 years

It wouldn't be hard to expand the function to include all ASCII characters. The array would need a length of 256 and subtraction of 97 would be dropped. Beyond that (ie: Unicode), I wonder what point running that vector through the hash function becomes too slow.
Dov Benyomin Sohacheski over 6 years

You should include an explanation as to why this solves the issue
Brad Solomon about 6 years

This will fail for ['bat', 'bat', 'rats', 'god', 'dog', 'cat', 'arts', 'star', 'rats'], returning bat as part of the result.
Daniel Gibson almost 6 years

This is known as the texas sharpshooter fallacy. Anyway, if we're playing by your rules, why not input a set?
Harshith Thota about 5 years

This is the best solution. I've tested it on a file containing 400000+ words and the Counter method is computed almost instantly while the classic sorting method took me a heck load of time.
C. Fennell about 4 years

This has answer has already been posted by other users in shorter, more efficient ways. What does your answer add?
Nidhi Donuru about 4 years

I wanted to show a code which doesn't need to import anything
Marco Castanho about 4 years

This didn't work for me because tuple is ordered, so the histograms take order into account. E.g. 'abc' has an histogram of ((a: 1), (b:1), (c:1)) and 'cab' has an histogram of ((c:1), (a:1), (b:1)). Because the histogram is a tuple, and tuple is ordered, the histograms are different and the logic doesn't consider 'abc' and 'cab' as anagrams. Replacing tuple with frozenset made it work for me.