Split a string to a string of valid words using Dynamic Programming

28,192

Solution 1

To formalize what @MinhPham suggested.

This is a dynammic programming solution.

Given a string str, let

b[i] = true if the substring str[0...i] (inclusive) can be split into valid words.

Prepend some starting character to str, say !, to represent the empty word. str = "!" + str

The base case is the empty string, so

b[0] = true.

For the iterative case:

b[j] = true if b[i] == true and str[i..j] is a word for all i < j

Solution 2

A dp solution in c++:

int main()
{
    set<string> dict;
    dict.insert("12");
    dict.insert("123");
    dict.insert("234");
    dict.insert("12345");
    dict.insert("456");
    dict.insert("1234");
    dict.insert("567");
    dict.insert("123342");
    dict.insert("42");
    dict.insert("245436564");
    dict.insert("12334");

    string str = "123456712334245436564";

    int size = str.size();
    vector<int> dp(size+1, -1);
    dp[0] = 0;
    vector<string > res(size+1);
    for(int i = 0; i < size; ++i)
    {
        if(dp[i] != -1)
        {
            for(int j = i+1; j <= size; ++j)
            {
                const int len = j-i;
                string substr = str.substr(i, len);
                if(dict.find(substr) != dict.end())
                {
                    string space = i?" ":"";
                    res[i+len] = res[i] + space + substr;
                    dp[i+len] = dp[i]+1;
                }
            }
        }
    }
    cout << *dp.rbegin() << endl;
    cout << *res.rbegin() << endl;

    return 0;
}

Solution 3

The O(N^2) Dp is clear but if you know the words of the dictionary, i think you can use some precomputations to get it even faster in O(N). Aho-Corasick

Share:
28,192
Pet
Author by

Pet

Updated on July 09, 2022

Comments

  • Pet
    Pet almost 2 years

    I need to find a dynamic programming algorithm to solve this problem. I tried but couldn't figure it out. Here is the problem:

    You are given a string of n characters s[1...n], which you believe to be a corrupted text document in which all punctuation has vanished (so that it looks something like "itwasthebestoftimes..."). You wish to reconstruct the document using a dictionary, which is available in the form of a Boolean function dict(*) such that, for any string w, dict(w) has value 1 if w is a valid word, and has value 0 otherwise.

    1. Give a dynamic programming algorithm that determines whether the string s[*] can be reconstituted as a sequence of valid words. The running time should be at most O(n^2), assuming that each call to dict takes unit time.
    2. In the event that the string is valid, make your algorithm output the corresponding sequence of words.
  • Minh Pham
    Minh Pham over 11 years
    Why don't you give the description what what you have done and explain why you have done so?
  • EmptyData
    EmptyData about 10 years
    @tobias could you please explain its algo
  • adijo
    adijo about 10 years
    Just some random code doesn't help anyone. You should provide an explanation.
  • Phil Glau
    Phil Glau about 5 years
    Your dictionary doesn't include all possible words in the string. For example "a", "as" and "he" are all valid words that can be found in this substring.