Using Python, find anagrams for a list of words

125,373

Solution 1

In order to do this for 2 strings you can do this:

def isAnagram(str1, str2):
    str1_list = list(str1)
    str1_list.sort()
    str2_list = list(str2)
    str2_list.sort()

    return (str1_list == str2_list)

As for the iteration on the list, it is pretty straight forward

Solution 2

Create a dictionary of (sorted word, list of word). All the words that are in the same list are anagrams of each other.

from collections import defaultdict

def load_words(filename='/usr/share/dict/american-english'):
    with open(filename) as f:
        for word in f:
            yield word.rstrip()

def get_anagrams(source):
    d = defaultdict(list)
    for word in source:
        key = "".join(sorted(word))
        d[key].append(word)
    return d

def print_anagrams(word_source):
    d = get_anagrams(word_source)
    for key, anagrams in d.iteritems():
        if len(anagrams) > 1:
            print(key, anagrams)

word_source = load_words()
print_anagrams(word_source)

Or:

word_source = ["car", "tree", "boy", "girl", "arc"]
print_anagrams(word_source)

Solution 3

One solution is to sort the word you're searching anagrams for (for example using sorted), sort the alternative and compare those.

So if you would be searching for anagrams of 'rac' in the list ['car', 'girl', 'tofu', 'rca'], your code could look like this:

word = sorted('rac')
alternatives = ['car', 'girl', 'tofu', 'rca']

for alt in alternatives:
    if word == sorted(alt):
        print alt

Solution 4

There are multiple solutions to this problem:

  1. Classic approach

    First, let's consider what defines an anagram: two words are anagrams of each other if they consist of the same set of letters and each letter appears exactly the same number or time in both words. This is basically a histogram of letters count of each word. This is a perfect use case for collections.Counter data structure (see docs). The algorithms is as follows:

    • Build a dictionary where keys would be histograms and values would be lists of words that have this histogram.
    • For each word build it's histogram and add it to the list that corresponds to this histogram.
    • Output list of dictionary values.

    Here is the code:

    from collections import Counter, defaultdict
    
    def anagram(words):
        anagrams = defaultdict(list)
        for word in words:
            histogram = tuple(Counter(word).items()) # build a hashable histogram
            anagrams[histogram].append(word)
        return list(anagrams.values())
    
    keywords = ("hi", "hello", "bye", "helol", "abc", "cab", 
                    "bac", "silenced", "licensed", "declines")
    
    print(anagram(keywords))
    

    Note that constructing Counter is O(l), while sorting each word is O(n*log(l)) where l is the length of the word.

  2. Solving anagrams using prime numbers

    This is a more advanced solution, that relies on the "multiplicative uniqueness" of prime numbers. You can refer to this SO post: Comparing anagrams using prime numbers, and here is a sample python implementation.

Solution 5

Since you can't import anything, here are two different approaches including the for loop you asked for.

Approach 1: For Loops and Inbuilt Sorted Function

word_list = ["percussion", "supersonic", "car", "tree", "boy", "girl", "arc"]

# initialize a list
anagram_list = []
for word_1 in word_list: 
    for word_2 in word_list: 
        if word_1 != word_2 and (sorted(word_1)==sorted(word_2)):
            anagram_list.append(word_1)
print(anagram_list)

Approach 2: Dictionaries

def freq(word):
    freq_dict = {}
    for char in word:
        freq_dict[char] = freq_dict.get(char, 0) + 1
    return freq_dict

# initialize a list
anagram_list = []
for word_1 in word_list: 
    for word_2 in word_list: 
        if word_1 != word_2 and (freq(word_1) == freq(word_2)):
            anagram_list.append(word_1)
print(anagram_list)

If you want these approaches explained in more detail, here is an article.

Share:
125,373

Related videos on Youtube

user1040563
Author by

user1040563

Updated on March 25, 2022

Comments

  • user1040563
    user1040563 over 2 years

    If I have a list of strings for example:

    ["car", "tree", "boy", "girl", "arc"...]
    

    What should I do in order to find anagrams in that list? For example (car, arc). I tried using for loop for each string and I used if in order to ignore strings in different lengths but I can't get the right result.

    How can I go over each letter in the string and compare it to others in the list in different order?

    I have read several similar questions, but the answers were too advanced. I can't import anything and I can only use basic functions.

    • M S
      M S over 12 years
      before comparing, try sorting each string.
    • Trufa
      Trufa over 12 years
      I don't really understand what you want to do if you find an anagram? Do you want another list, only with the one that has anagrams? is one anagram enough? Or do you want to know if one word has an anagram in that list?
    • hughdbrown
      hughdbrown over 12 years
      Are you trying to find all tthe combinations that you can make with a set of letters or the actual anagrams? For the former, look at itertools.combinations(). For the latter, try my code.
    • Matthew
      Matthew over 10 years
      Don't you mean a palindrome?
    • John White
      John White almost 4 years
      Hugh, where is your code?...?
  • user1040563
    user1040563 over 12 years
    Thank you, never thought about sorting its much easier this way
  • Admin
    Admin over 12 years
    The body of this function can be reduced to return sorted(str1) == sorted(str2)
  • Kracekumar
    Kracekumar over 12 years
    I haven't used reversed() because it yields generator.
  • Rajeev
    Rajeev about 12 years
    I was going through this post and this was very easy and cleanest way of finding an anagram
  • Teepeemm
    Teepeemm almost 11 years
    This looks for palindromes, not anagrams.
  • Dennis
    Dennis over 10 years
    In addition to return sorted(str1) == sorted(str2) being less code to type, it should also be slightly more efficient. But sorting is still O(n lg n), you can find anagrams in O(n) with a dictionary.
  • starcodex
    starcodex over 9 years
    @Dennis Yea but it depends on the constraints. A dictionary would also use O(n) space. There are almost always tradeoffs to be made.
  • gmo
    gmo over 9 years
    Very (very!) old question... also, you're giving a java solution for a python question... Recommended reading -> stackoverflow.com/help/how-to-answer
  • Anil
    Anil over 7 years
    def anagrams(word, words): return filter(lambda x: sorted(word) == sorted(x), words)
  • Asclepius
    Asclepius over 7 years
    The defaultdict instance must typically be frozen after it has been fully populated. This can be done by setting its default_factory = None. Failing this, the dict will be updated by mere lookups - this is not what most users want.
  • Asclepius
    Asclepius over 7 years
    The answer is totally wrong and abominable. Consider: ord('a') + ord('d') == ord('b') + ord('c')
  • Asclepius
    Asclepius over 7 years
    Using vector_anagram doesn't work, if only because 26 is not sufficient to represent non-[a-z] characters, and there are too large a number of them. The solution must most definitely be generic enough to work with non-[a-z] characters as well.
  • Asclepius
    Asclepius over 7 years
    Any solution that works only for the letters a-z is a bad solution because it is not sufficiently generic at all. There are many other characters that exist besides a-z, and they must most definitely be supported.
  • Admin
    Admin almost 7 years
    The solution is incomplete. Because you are not considering phrases. This link: en.wikipedia.org/wiki/Anagram puts this point clear. E.G : "ANAGRAM" could be transform in "NAG A RAM" with the code provided you wont it. You need to : 1) Remove spaces 2) Check that Str1 !== Str2
  • Admin
    Admin almost 7 years
    def isAnagram(str1, str2): if (str1 == str2): return false str1_list = list(str1) str1_list.sort() str2_list = list(str2) str2_list.sort() str1_list(filter(lambda a: a != ' ',str1_list)) str2_list(filter(lambda a: a != ' ',str2_list)) return (str1_list == str2_list)
  • Daniel Gibson
    Daniel Gibson almost 7 years
    It wouldn't be hard to expand the function to include all ASCII characters. The array would need a length of 256 and subtraction of 97 would be dropped. Beyond that (ie: Unicode), I wonder what point running that vector through the hash function becomes too slow.
  • Dov Benyomin Sohacheski
    Dov Benyomin Sohacheski over 6 years
    You should include an explanation as to why this solves the issue
  • Brad Solomon
    Brad Solomon about 6 years
    This will fail for ['bat', 'bat', 'rats', 'god', 'dog', 'cat', 'arts', 'star', 'rats'], returning bat as part of the result.
  • Daniel Gibson
    Daniel Gibson almost 6 years
    This is known as the texas sharpshooter fallacy. Anyway, if we're playing by your rules, why not input a set?
  • Harshith Thota
    Harshith Thota about 5 years
    This is the best solution. I've tested it on a file containing 400000+ words and the Counter method is computed almost instantly while the classic sorting method took me a heck load of time.
  • C. Fennell
    C. Fennell about 4 years
    This has answer has already been posted by other users in shorter, more efficient ways. What does your answer add?
  • Nidhi Donuru
    Nidhi Donuru about 4 years
    I wanted to show a code which doesn't need to import anything
  • Marco Castanho
    Marco Castanho about 4 years
    This didn't work for me because tuple is ordered, so the histograms take order into account. E.g. 'abc' has an histogram of ((a: 1), (b:1), (c:1)) and 'cab' has an histogram of ((c:1), (a:1), (b:1)). Because the histogram is a tuple, and tuple is ordered, the histograms are different and the logic doesn't consider 'abc' and 'cab' as anagrams. Replacing tuple with frozenset made it work for me.