Permutation of string as substring of another

20,066

Solution 1

There is a simpler solution to this problem which can be done in linear time.

Here: n = A.size (), m = B.size ()

The idea is to use hashing.

First we hash the characters of string B.

Suppose: B = "dep"

  • hash_B ['d'] = 1;
  • hash_B ['e'] = 1;
  • hash_B ['p'] = 1;

Now we run a loop over the string 'A' for each window of size 'm'.

Suppose: A = "encyclopedia"

First window of size 'm' will have characters {e, n, c}. We will hash them now.

  • win ['e'] = 1
  • win ['n'] = 1
  • win ['c'] = 1

Now we check if the frequency of each character from both the arrays (hash_B [] and win []) are same. Note: Maximum size of hash_B [] or win [] is 26.

If they are not same we shift our window.

After shifting the window we decrease the count of win ['e'] by 1 and increase the count of win ['y'] by 1.

  • win ['n'] = 1
  • win ['c'] = 1
  • win ['y'] = 1

During the seventh shift, the status of your win array is:

  • win ['p'] = 1;
  • win ['e'] = 1;
  • win ['d'] = 1;

which is same as the hash_B array. So, Print "SUCCESS" and exit.

Solution 2

If I only have to worry about ASCII characters, it can be done in O(n) time with O(1) space. My code also prints the permutations out, but can be easily modified to simply return true at the first instance instead. The main part of the code is located in the printAllPermutations() method. Here is my solution:

Some Background

This is a solution that I came up with, it is somewhat similar to the idea behind the Rabin Karp Algorithm. Before I understanding the algorithm, I will explain the math behind it as follows:

Let S = {A_1, ..., A_n} be a multiset list of size N that contains only prime numbers. Let the sum of the numbers in S equal some integer Q. Then S is the only possible entirely prime multiset of size N, whose elements can sum to Q.

Because of this, we know we can map every character to a prime number. I propose a map as follows:

1 -> 1st prime
2 -> 2nd prime
3 -> 3rd prime
...
n -> nth prime

If we do this (which we can because ASCII only has 256 possible characters), then it becomes very easy for us to find each permutation in the larger string B.

The Algorithm:

We will do the following:

1: calculate the sum of the primes mapped to by each of the characters in A, let's call it smallHash.

2: create 2 indices (righti and lefti). righti is initialized to zero, and lefti is initialzed to the size of A.

ex:     |  |
        v  v
       "abcdabcd"
        ^  ^
        |  |

3: Create a variable currHash, and initialize it to the sum of the corresponding prime numbers mapped to by each of the characters in B, between (inclusive) righti, and lefti - 1.

4: Iterate both righti and lefti by 1, each time updating currHash by subtracting the prime mapped from the character that is no longer in the range (lefti - 1) and adding the prime corresponding to the character just added to the range (righti)

5: Each time currHash is equal to smallHash, the characters in the range must be a permutation. So we print them out.

6: Continue until we have reached the end of B. (When righti is equal to the length of B)

This solution runs in O(n) time complexity and O(1) space.

The Actual Code:

public class FindPermutationsInString {
    //This is an array containing the first 256 prime numbers
    static int primes[] = 
          {
            2,     3,     5,     7,    11,    13,    17,    19,    23,    29,
            31,    37,    41,    43,    47,    53,    59,    61,    67,    71,
            73,    79,    83,    89,    97,   101,   103,   107,   109,   113,
            127,   131,   137,   139,   149,   151,   157,   163,   167,   173,
            179,   181,   191,   193,   197,   199,   211,   223,   227,   229,
            233,   239,   241,   251,   257,   263,   269,   271,   277,   281,
            283,   293,   307,   311,   313,   317,   331,   337,   347,   349,
            353,   359,   367,   373,   379,   383,   389,   397,   401,   409,
            419,   421,   431,   433,   439,   443,   449,   457,   461,   463,
            467,   479,   487,   491,   499,   503,   509,   521,   523,   541,
            547,   557,   563,   569,   571,   577,   587,   593,   599,   601,
            607,   613,   617,   619,   631,   641,   643,   647,   653,   659,
            661,   673,   677,   683,   691,   701,   709,   719,   727,   733,
            739,   743,   751,   757,   761,   769,   773,   787,   797,   809,
            811,   821,   823,   827,   829,   839,   853,   857,   859,   863,
            877,   881,   883,   887,   907,   911,   919,   929,   937,   941,
            947,   953,   967,   971,   977,   983,   991,   997,  1009,  1013,
           1019,  1021,  1031,  1033,  1039,  1049,  1051,  1061,  1063,  1069,
           1087,  1091,  1093,  1097,  1103,  1109,  1117,  1123,  1129,  1151,
           1153,  1163,  1171,  1181,  1187,  1193,  1201,  1213,  1217,  1223,
           1229,  1231,  1237,  1249,  1259,  1277,  1279,  1283,  1289,  1291,
           1297,  1301,  1303,  1307,  1319,  1321,  1327,  1361,  1367,  1373,
           1381,  1399,  1409,  1423,  1427,  1429,  1433,  1439,  1447,  1451,
           1453,  1459,  1471,  1481,  1483,  1487,  1489,  1493,  1499,  1511,
           1523,  1531,  1543,  1549,  1553,  1559,  1567,  1571,  1579,  1583,
           1597,  1601,  1607,  1609,  1613,  1619
          };

    public static void main(String[] args) {
        String big = "abcdabcd";
        String small = "abcd";
        printAllPermutations(big, small);
    }

    static void printAllPermutations(String big, String small) {

        // If the big one is smaller than the small one,
        // there can't be any permutations, so return
        if (big.length() < small.length()) return;

        // Initialize smallHash to be the sum of the primes
        // corresponding to each of the characters in small.
        int smallHash = primeHash(small, 0, small.length());

        // Initialize righti and lefti.
        int lefti = 0, righti = small.length();

        // Initialize smallHash to be the sum of the primes
        // corresponding to each of the characters in big.
        int currentHash = primeHash(small, 0, righti);

        while (righti <= big.length()) {
            // If the current section of big is a permutation
            // of small, print it out.
            if (currentHash == smallHash)
                System.out.println(big.substring(lefti, righti));

            // Subtract the corresponding prime value in position
            // lefti. Then increment lefti
            currentHash -= primeHash(big.charAt(lefti++));

            if (righti < big.length()) // To prevent index out of bounds
                // Add the corresponding prime value in position righti.
                currentHash += primeHash(big.charAt(righti));

            //Increment righti.
            righti++;
        }

    }

    // Gets the sum of all the nth primes corresponding
    // to n being each of the characters in str, starting
    // from position start, and ending at position end - 1.
    static int primeHash(String str, int start, int end) {
        int value = 0;
        for (int i = start; i < end; i++) {
            value += primeHash(str.charAt(i));
        }
        return value;
    }

    // Get's the n-th prime, where n is the ASCII value of chr
    static int primeHash(Character chr) {
        return primes[chr];
    }
}

Keep in mind, however, that this solution only works when the characters can only be ASCII characters. If we are talking about unicode, we start getting into prime numbers that exceed the maximum size of an int, or even a double. Also, I'm not sure that there are 1,114,112 known primes.

Solution 3

Building a little on the algorithm presented by j_random_hacker in comments, it is possible to find the match in O(|A|+|B|), as follows: (Note: throughout, we use |A| to mean "the length of A".)

  1. Create an integer array count whose domain is the size of the alphabet, initialized to all 0s.
  2. Set distance to 0
  3. For each character Bi in B:
    • Decrement count[Bi].
    • If the previous count of count[Bi] was 0, also increment distance.
  4. For each character Ai in A:
    • Increment count[Ai]
    • If i is greater than |B| decrement count[Ai-|B|].
    • For each of the two count values modified, if the previous value was 0, then increment distance and if the new value is 0 then decrement distance.
    • If the result is that distance is 0 then a match has been found.

Note: The algorithm presented by j_random_hacker is also O(|A|+|B]) because the cost of comparing freqA with freqB is O(|alphabet|), which is a constant. However, the above algorithm reduces the comparison cost to a small constant. In addition, it is theoretically possible to make this work even if the alphabet is not a constant size by using the standard trick for uninitialized arrays.

Solution 4

This answer proposes a fixed implementation of the idea proposed by @Ephraim in his own answer.

The original answer confuses the multiplication property of a given set of primes with addition. The fixed statement would be:

Let S = {A_1, ..., A_n} be a multiset list of size N that contains only prime numbers. Let the product of the numbers in S equal some integer Q. Then S is the only possible entirely prime multiset of size N, whose elements can multiply to Q.

In order to avoid numerical overflows, the implementation uses infinite precision arithmetic based on the C++ library libgmpxx.

Under the assumption that the comparison between two numbers is in O(1), then the solution is in O(|A| + |B|). The previous assumption might actually not be the case for inputs where |B| is large enough. When |B| > |A| the function returns in O(1).

Example:

#include <iostream>
#include <string>
#include <gmpxx.h>

static int primes[] =
          {
            2,     3,     5,     7,    11,    13,    17,    19,    23,    29,
            31,    37,    41,    43,    47,    53,    59,    61,    67,    71,
            73,    79,    83,    89,    97,   101,   103,   107,   109,   113,
            127,   131,   137,   139,   149,   151,   157,   163,   167,   173,
            179,   181,   191,   193,   197,   199,   211,   223,   227,   229,
            233,   239,   241,   251,   257,   263,   269,   271,   277,   281,
            283,   293,   307,   311,   313,   317,   331,   337,   347,   349,
            353,   359,   367,   373,   379,   383,   389,   397,   401,   409,
            419,   421,   431,   433,   439,   443,   449,   457,   461,   463,
            467,   479,   487,   491,   499,   503,   509,   521,   523,   541,
            547,   557,   563,   569,   571,   577,   587,   593,   599,   601,
            607,   613,   617,   619,   631,   641,   643,   647,   653,   659,
            661,   673,   677,   683,   691,   701,   709,   719,   727,   733,
            739,   743,   751,   757,   761,   769,   773,   787,   797,   809,
            811,   821,   823,   827,   829,   839,   853,   857,   859,   863,
            877,   881,   883,   887,   907,   911,   919,   929,   937,   941,
            947,   953,   967,   971,   977,   983,   991,   997,  1009,  1013,
           1019,  1021,  1031,  1033,  1039,  1049,  1051,  1061,  1063,  1069,
           1087,  1091,  1093,  1097,  1103,  1109,  1117,  1123,  1129,  1151,
           1153,  1163,  1171,  1181,  1187,  1193,  1201,  1213,  1217,  1223,
           1229,  1231,  1237,  1249,  1259,  1277,  1279,  1283,  1289,  1291,
           1297,  1301,  1303,  1307,  1319,  1321,  1327,  1361,  1367,  1373,
           1381,  1399,  1409,  1423,  1427,  1429,  1433,  1439,  1447,  1451,
           1453,  1459,  1471,  1481,  1483,  1487,  1489,  1493,  1499,  1511,
           1523,  1531,  1543,  1549,  1553,  1559,  1567,  1571,  1579,  1583,
           1597,  1601,  1607,  1609,  1613,  1619
          };

mpz_class prime_hash(std::string const &str, size_t start, size_t end)
{
    mpz_class hash(1);
    for (size_t i = start; i < end; ++i) {
        hash *= primes[(size_t) str.at(i)];
    }
    return hash;
}

void print_all_permutations(std::string const &big, std::string const &small)
{
    const size_t big_len = big.length();
    const size_t small_len = small.length();

    if (small_len <= 0 || big_len < small_len) {
        // no possible permutation!
        return;
    }

    // O(small_len)
    mpz_class small_hash = prime_hash(small, 0, small_len);
    mpz_class curr_hash = prime_hash(big, 0, small_len);

    // O(big_len)
    size_t left_idx = 0;
    do {

        if (curr_hash == small_hash) {
            std::cout << "Permutation `" << big.substr(left_idx, small_len)
                      << "' of `" << small
                      << "' at index " << left_idx
                      << " of `" << big
                      << "'." << std::endl;
        }

        curr_hash /= primes[(size_t) big.at(left_idx)];

        if (left_idx + small_len < big_len) {
            curr_hash *= primes[(size_t) big.at(left_idx + small_len)];
        }

        ++left_idx;

    } while (left_idx + small_len < big_len);
}

int main(int argc, char *argv[])
{
    // @user2826957's input
    print_all_permutations("encyclopedia", "dep");

    // @Ephraim's input
    print_all_permutations("abcdabcd", "abcd");

    // @Sam's input
    print_all_permutations("cbabadcbbabbc", "abbc");

    // @butt0s input
    print_all_permutations("", "");
    print_all_permutations("", "a");
    print_all_permutations("a", "");
    print_all_permutations("a", "a");

    return 0;
}

The example is compiled with:

~$ g++ permstr.cpp -lgmpxx -lgmp -o run
~$ ./run
Permutation `ped' of `dep' at index 7 of `encyclopedia'.
Permutation `abcd' of `abcd' at index 0 of `abcdabcd'.
Permutation `bcda' of `abcd' at index 1 of `abcdabcd'.
Permutation `cdab' of `abcd' at index 2 of `abcdabcd'.
Permutation `dabc' of `abcd' at index 3 of `abcdabcd'.
Permutation `cbab' of `abbc' at index 0 of `cbabadcbbabbc'.
Permutation `cbba' of `abbc' at index 6 of `cbabadcbbabbc'.
Permutation `a' of `a' at index 0 of `a'.

Solution 5

The below function will return true if the String B is a permuted substring of String A.

public boolean isPermutedSubstring(String B, String A){
    int[] arr = new int[26];

    for(int i = 0 ; i < A.length();++i){
        arr[A.charAt(i) - 'a']++;
    }
    for(int j=0; j < B.length();++j){
        if(--arr[B.charAt(j)-'a']<0) return false;
    }
    return true;
}
Share:
20,066
user2826957
Author by

user2826957

Updated on December 10, 2020

Comments

  • user2826957
    user2826957 over 3 years

    Given a string A and another string B. Find whether any permutation of B exists as a substring of A.

    For example,

    if A = "encyclopedia"

    if B="dep" then return true as ped is a permutation of dep and ped is a substring of A.

    My solution->
    
    if length(A)=n and length(B)=m
    
    I did this in 0((n-m+1)*m) by sorting B and then checking A 
    with window size of m each time.
    

    I need to find a better and a faster solution.