how to find the number of distinct subsequences of a string?

29,912

Solution 1

It's a classic dynamic programming problem.

Let:

dp[i] = number of distinct subsequences ending with a[i]
sum[i] = dp[1] + dp[2] + ... + dp[i]. So sum[n] will be your answer.
last[i] = last position of character i in the given string.

A null string has one subsequence, so dp[0] = 1.

read a
n = strlen(a)
for i = 1 to n
  dp[i] = sum[i - 1] - sum[last[a[i]] - 1]
  sum[i] = sum[i - 1] + dp[i]
  last[a[i]] = i

return sum[n]

Explanation

dp[i] = sum[i - 1] - sum[last[a[i]] - 1]

Initially, we assume we can append a[i] to all subsequences ending on previous characters, but this might violate the condition that the counted subsequences need to be distinct. Remember that last[a[i]] gives us the last position a[i] appeared on until now. The only subsequences we overcount are those that the previous a[i] was appended to, so we subtract those.

sum[i] = sum[i - 1] + dp[i]
last[a[i]] = i

Update these values as per their definition.

If your indexing starts from 0, use a[i - 1] wherever I used a[i]. Also remember to wrap your computations in a mod function if you're going to submit code. This should be implemented like this:

mod(x) = (x % m + m) % m

In order to correctly handle negative values in some languages (such as C/C++).

Solution 2

There exists an easier solution to this problem.

The idea is: If all character of the string are distinct, total number of subsequences is 2^n. Now, if we find any character that have already occurred before, we should consider its last occurrence only (otherwise sequence won't be distinct). So we have to subtract the number of subsequences due to its previous occurrence.

My implementation is like this:

read s
dp[0] = 1
len = strlen(s)
last[s.length()] = {-1} //declaring `last` array with same as length of string `s` and all elements initialized with -1.

for (i = 1; i <= len; i++) 
{
    dp[i] = (dp[i - 1] * 2)
    if (last[s[i]] > 0) dp[i] = (dp[i] - dp[last[s[i]] - 1])
    last[s[i]] = i
}
Share:
29,912
Admin
Author by

Admin

Updated on July 09, 2022

Comments

  • Admin
    Admin almost 2 years

    Here is another spoj problem that asks how to find the number of distinct subsequences of a string ?

    For example,

    Input
    AAA
    ABCDEFG
    CODECRAFT

    Output
    4
    128
    496

    How can I solve this problem ?

  • Paul Draper
    Paul Draper over 11 years
    The dp array is actually unnecessary here, but it does help the explanation of this.
  • Minkesh Jain
    Minkesh Jain over 6 years
    Thanks mostafiz. geeksforgeeks.org/count-distinct-subsequences it has complete solution/
  • user3797829
    user3797829 over 5 years
    @mostafiz why do we subtract -1 from last[s[i]] - 1], i.e. why do we do this dp[last[s[i]] - 1] and not just dp[last[s[i]]]
  • ajaysinghnegi
    ajaysinghnegi over 5 years
    @MostafizRahman Can you explain what do you mean by last occurrence in your answer?
  • ajaysinghnegi
    ajaysinghnegi over 5 years
    @AjaySinghNegi It means no. of subsequences when dp[i] was calculated with the present character appearing previously at that time. Hence, that many no of subsequences at that time should be deducted from the present no. of subsequences so that they are not calculated redundantly again at present calculation of dp[i] = 2*dp[i-1].
  • Mostafiz Rahman
    Mostafiz Rahman over 5 years
    as I've just accepted your edit, I seems like you got your answer, right?