Split a string to a string of valid words using Dynamic Programming
Solution 1
To formalize what @MinhPham suggested.
This is a dynammic programming solution.
Given a string str, let
b[i] = true if the substring str[0...i] (inclusive) can be split into valid words.
Prepend some starting character to str, say !, to represent the empty word. str = "!" + str
The base case is the empty string, so
b[0] = true.
For the iterative case:
b[j] = true if b[i] == true and str[i..j] is a word for all i < j
Solution 2
A dp solution in c++:
int main()
{
set<string> dict;
dict.insert("12");
dict.insert("123");
dict.insert("234");
dict.insert("12345");
dict.insert("456");
dict.insert("1234");
dict.insert("567");
dict.insert("123342");
dict.insert("42");
dict.insert("245436564");
dict.insert("12334");
string str = "123456712334245436564";
int size = str.size();
vector<int> dp(size+1, -1);
dp[0] = 0;
vector<string > res(size+1);
for(int i = 0; i < size; ++i)
{
if(dp[i] != -1)
{
for(int j = i+1; j <= size; ++j)
{
const int len = j-i;
string substr = str.substr(i, len);
if(dict.find(substr) != dict.end())
{
string space = i?" ":"";
res[i+len] = res[i] + space + substr;
dp[i+len] = dp[i]+1;
}
}
}
}
cout << *dp.rbegin() << endl;
cout << *res.rbegin() << endl;
return 0;
}
Solution 3
The O(N^2)
Dp is clear but if you know the words of the dictionary, i think you can use some precomputations to get it even faster in O(N)
.
Aho-Corasick
Pet
Updated on July 09, 2022Comments
-
Pet almost 2 years
I need to find a dynamic programming algorithm to solve this problem. I tried but couldn't figure it out. Here is the problem:
You are given a string of n characters s[1...n], which you believe to be a corrupted text document in which all punctuation has vanished (so that it looks something like "itwasthebestoftimes..."). You wish to reconstruct the document using a dictionary, which is available in the form of a Boolean function dict(*) such that, for any string w, dict(w) has value 1 if w is a valid word, and has value 0 otherwise.
- Give a dynamic programming algorithm that determines whether the string s[*] can be reconstituted as a sequence of valid words. The running time should be at most O(n^2), assuming that each call to dict takes unit time.
- In the event that the string is valid, make your algorithm output the corresponding sequence of words.
-
Minh Pham over 11 yearsWhy don't you give the description what what you have done and explain why you have done so?
-
EmptyData about 10 years@tobias could you please explain its algo
-
adijo about 10 yearsJust some random code doesn't help anyone. You should provide an explanation.
-
Phil Glau about 5 yearsYour dictionary doesn't include all possible words in the string. For example "a", "as" and "he" are all valid words that can be found in this substring.