Computing All The Possible Substrings of a Given String

16,728

Solution 1

Just use two for-loops:

generate substrings(string):
    for start in [0,1,...,string.length-1]:
        for end in [start,...,string.length-1]:
            yield string[start...end]

You can also do it this way with two for-loops:

generate substrings(string):
    for substringLength in [1,2,...,string.length]:
        for start in range [0,1,...,string.length-substringLength]:
            yield string[start...(start+substringLength-1)]
    yield ""

You probably want to include the empty string "" in the sequence you return as well, as it is a substring of all strings.

You also need to consider if it is valid to yield a duplicate string multiple times (e.g. do you return "ABA" twice as a substring of "ABABA"?). If the answer is no, merely make a hashtable called alreadyYielded, and whenever you yield, abort if you've already yielded the string, otherwise add the value to the hashtable in case you see it again. For example:

seen = new HashTable()
...
        substring = string[...]
        if substring not in seen:
            seen.add(substring)
            yield substring
...

Solution 2

Here's a 2-cent answer:

for (indexOfFirstLetterOfString = 0; indexOfFirstLetterOfString < string.length; indexOfFirstLetterOfString++) {

   for (indexOfLastLetterOfString = indexOfFirstLetterOfString + 1; indexOfLastLetterOfString < string.length; indexOfLastLetterOfString++) {

        addToArrayOfStrings ( string.substring (indexOfFirstLetterOfString, indexOfLastLetterOfString - indexOfFirstLetterOfString))
        incrementCounter();

    }
}

To get the number of combinations, simply add a counter to the inner loop.

For instance, in perl, this might look like:

$a = "ABCDE";

$numberOfSubstrings = 0;

for ($indexOfFirstLetter = 0; $indexOfFirstLetter <= length($a); $indexOfFirstLetter++) {

    for ($indexOfLastLetter = $indexOfFirstLetter + 1; $indexOfLastLetter <= length($a); $indexOfLastLetter++)  {
        print substr($a, $indexOfFirstLetter, $indexOfLastLetter - $indexOfFirstLetter) . "\n";

        $numberOfSubStrings++;
    }
}

print "Number of substrings: " . $numberOfSubStrings;
Share:
16,728

Related videos on Youtube

neilmarion
Author by

neilmarion

Updated on June 04, 2022

Comments

  • neilmarion
    neilmarion almost 2 years

    Possible Duplicate:
    How to find all substrings of a string in PHP
    Find all subsets of a list

    How can I compute all the possible substrings of a string? For example given a string ABCDE. All its possible substrings will be

    A, B, C, D, E, AB, BC, CD, DE, ABC, BCD, CDE, ABCD, BCDE, ABCDE

    Thanks! A pseudocode will be highly appreciated. :D

    • bobbymcr
      bobbymcr over 12 years
      This type of question has been asked many times before: stackoverflow.com/questions/728972 , stackoverflow.com/questions/1592039 , stackoverflow.com/questions/5023081 , stackoverflow.com/questions/6780935 , etc. etc. Just search on "powerset" or "subset".
    • ninjagecko
      ninjagecko over 12 years
      I disagree with both Ken and bobbymcr. While it is true that the linked question is "how to find substrings of a string?" it seems to involve some heavy PHP with minimal explanation, as do the answers. All of the links bobbymcr linked are completely separate problems, though related (substrings are not subsets). Upon cursory inspection, I cannot find a similar language-agnostic question with clear answers given in pseudocode.
    • bobbymcr
      bobbymcr over 12 years
      @ninjagecko: On second reading, I see that you're right -- the OP seems to want something more specialized than pure subsets, but something like substrings with fully connected elements (e.g. from ABC -> AB and BC, but not AC).
  • ninjagecko
    ninjagecko over 12 years
    neilmarion: if you look at the number of strings the second loop returns, it looks like this: LENGTH+(LENGTH-1)+(LENGTH-2)+...+1, which is equal to LENGTH*(LENGTH+1)/2
  • DyingIsFun
    DyingIsFun over 7 years
    I get a syntax error for .... What am I missing?