Best solution for an anagram check?

16,536

Solution 1

There are several ways to check whether two strings are anagrams or not . Your question is , which one is better solution . Your first solution has sorting logic. Sorting has worst case complexity of (nlogn) . Your second logic is only using one loop which has complexity O(n) .

So out of this two , your second solution which is having only O(n) complexity will be a better solution than first one .

One possible solution :

private boolean checkAnagram(String stringOne , String stringTwo){
        char[] first = stringOne.toLowerCase().toCharArray(); 
        char[] second = stringTwo.toLowerCase().toCharArray();
        // if length of strings is not same 
        if (first.length != second.length)
            return false;
        int[] counts = new int[26]; 
        for (int i = 0; i < first.length; i++){
            counts[first[i]-97]++;  
            counts[second[i]-97]--;   
        }
        for (int i = 0; i<26; i++)
            if (counts[i] != 0)
                return false;
        return true;
    }

Solution 2

Here is a very simple implementation.

public boolean isAnagram(String strA, String strB) {
  // Cleaning the strings (remove white spaces and convert to lowercase)
  strA = strA.replaceAll("\\s+","").toLowerCase();
  strB = strB.replaceAll("\\s+","").toLowerCase();

  // Check every char of strA and removes first occurence of it in strB
  for (int i = 0; i < strA.length(); i++ ) {
    if (strB.equals("")) return false;  // strB is already empty : not an anagram
    strB = strB.replaceFirst(Pattern.quote("" + strA.charAt(i)), "");
  }

  // if strB is empty we have an anagram
  return strB.equals("");
}

And finally :

System.out.println(isAnagram("William Shakespeare", "I am a weakish speller")); // true

Solution 3

This is a much simpler, easy-to-read solution I was able to compile...

    static boolean isAnagram(String a, String b) {
    if (a.length() == b.length()){
        char[] arr1 = a.toLowerCase().toCharArray();
        char[] arr2 = b.toLowerCase().toCharArray();
        Arrays.sort(arr1);
        Arrays.sort(arr2);
        if (Arrays.equals(arr1, arr2)) return true;
        else return false;
    }else return false;
}

Best, Justin

Solution 4

My solution : Time Complexity = O(n)

public static boolean isAnagram(String str1, String str2) {
    if (str1.length() != str2.length()) {
        return false;
    }

    for (int i = 0; i < str1.length(); i++) {
        char ch = str1.charAt(i);

        if (str2.indexOf(ch) == -1) 
            return false;
        else
            str2 = str2.replaceFirst(String.valueOf(ch), " ");
    }

    return true;
}

Test case :

@Test
public void testIsPernutationTrue() {
    assertTrue(Anagram.isAnagram("abc", "cba"));
    assertTrue(Anagram.isAnagram("geeksforgeeks", "forgeeksgeeks"));
    assertTrue(Anagram.isAnagram("anagram", "margana"));
}

@Test
public void testIsPernutationFalse() {
    assertFalse(Anagram.isAnagram("abc", "caa"));
    assertFalse(Anagram.isAnagram("anagramm", "marganaa"));
}

Solution 5

I tried a few solutions using Sets, and made each one run 10 million times to test using your example array of:

private static String[] input = {"tea", "ate", "eat", "apple", "java", "vaja", "cut", "utc"};

Firstly, the method i used to call these algotirhms:

public static void main(String[] args) {
    long startTime = System.currentTimeMillis();
    for (int x = 0; x < 10000000; x++) {
        Set<String> confirmedAnagrams = new HashSet<>();
        for (int i = 0; i < (input.length / 2) + 1; i++) {
            if (!confirmedAnagrams.contains(input[i])) {
                for (int j = i + 1; j < input.length; j++) {
                        if (isAnagrams1(input[i], input[j])) {
                            confirmedAnagrams.add(input[i]);
                            confirmedAnagrams.add(input[j]);
                        }
                }
            }
        }
        output = confirmedAnagrams.toArray(new String[confirmedAnagrams.size()]);
    }
    long endTime = System.currentTimeMillis();
    System.out.println("Total time: " + (endTime - startTime));
    System.out.println("Average time: " + ((endTime - startTime) / 10000000D));
}

I then used algorithms based on a HashSet of characters. I add each character of each word to the HashSet, and should the HashSet not be the length of the initials words, it would mean they are not anagrams.

My algorithms and their runtimes:

Algorithm 1:

    private static boolean isAnagrams1(String x, String y) {
    if (x.length() != y.length()) {
        return false;
    } else if (x.equals(y)) {
        return true;
    }

    Set<Character> anagramSet = new HashSet<>();
    for (int i = 0; i < x.length(); i++) {
        anagramSet.add(x.charAt(i));
        anagramSet.add(y.charAt(i));
    }

    return anagramSet.size() != x.length();
}

This has the runtime of:

Total time: 6914
Average time: 6.914E-4

Algorithm 2

private static boolean isAnagrams2(String x, String y) {
    if (x.length() != y.length()) {
        return false;
    } else if (x.equals(y)) {
        return true;
    }

    Set<Character> anagramSet = new HashSet<>();
    char[] xAr = x.toCharArray();
    char[] yAr = y.toCharArray();
    for (int i = 0; i < xAr.length; i++) {
        anagramSet.add(xAr[i]);
        anagramSet.add(yAr[i]);
    }

    return anagramSet.size() != x.length();
}

Has the runtime of:

Total time: 8752
Average time: 8.752E-4

Algorithm 3

For this algorithm, I decided to send the Set through, therefore I only create it once for every cycle, and clear it after each test.

    private static boolean isAnagrams3(Set<Character> anagramSet, String x, String y) {
    if (x.length() != y.length()) {
        return false;
    } else if (x.equals(y)) {
        return true;
    }

    for (int i = 0; i < x.length(); i++) {
        anagramSet.add(x.charAt(i));
        anagramSet.add(y.charAt(i));
    }

    return anagramSet.size() != x.length();
}

Has the runtime of:

Total time: 8251
Average time: 8.251E-4

Algorithm 4

This algorithm is not mine, it belongs to Pratik Upacharya which answered the question as well, in order for me to compare:

    private static boolean isAnagrams4(String stringOne, String stringTwo) {
    char[] first = stringOne.toLowerCase().toCharArray();
    char[] second = stringTwo.toLowerCase().toCharArray();
    // if length of strings is not same 
    if (first.length != second.length) {
        return false;
    }
    int[] counts = new int[26];
    for (int i = 0; i < first.length; i++) {
        counts[first[i] - 97]++;
        counts[second[i] - 97]--;
    }
    for (int i = 0; i < 26; i++) {
        if (counts[i] != 0) {
            return false;
        }
    }
    return true;
}

Has the runtime of:

Total time: 5707
Average time: 5.707E-4

Of course, these runtimes do differ for every test run, and in order to do proper testing, a larger example set is needed, and maybe more iterations thereof.

*Edited, as I made a mistake in my initial method, Pratik Upacharya's algorithm does seem to be the faster one

Share:
16,536
Drew L. Facchiano
Author by

Drew L. Facchiano

Updated on July 29, 2022

Comments

  • Drew L. Facchiano
    Drew L. Facchiano almost 2 years

    I’m going through a permutation/anagram problem and wanted input on the most efficient means of checking. Now, I’m doing this in Java land, and as such there is a library for EVERYTHING including sorting. The first means of checking if two string are anagrams of each other is to check length, sort them in some manner, then compare each index of said string. Code below:

    private boolean validAnagram(String str, String pair) {
    if(str.length() != pair.length()){
        return false;
    }
    
    char[] strArr = str.toCharArray();
    char[] pairArr = pair.toCharArray();
    
    
    Arrays.sort(strArr);
    str = new String(strArr);
    
    Arrays.sort(pairArr);
    pair = new String(pairArr);
    
    for(int i = 0; i<str.length(); i++){
        if(str.charAt(i) != pair.charAt(i)){
            return false;
        }
    }
    return true;
    }
    

    Alternatively, I figured it would be easier to check based on ascii value and avoid a check on every possible character. Code below:

    private boolean validAnagram(String str, String pair) {
    if(str.length() != pair.length()){
        return false;
    }
    
    char[] strArr = str.toCharArray();
    char[] pairArr = pair.toCharArray();
    
    
    
    int strValue = 0;
    int pairValue = 0;
    
    for(int i =0; i < strArr.length; i++){
        strValue+= (int) strArr[i];
        pairValue+= (int) pairArr[i];
    }
    
    if(strValue != pairValue){
        return false;
    }
    return true;
    }
    

    So, which is a better solution? I don’t know much about the sort that Arrays is giving me, however that’s the more common answer when I look around the old internets. Makes me wonder if I’m missing something.