Finding shortest repeating cycle in word?
Solution 1
O(n) solution. Assumes that the entire string must be covered. The key observation is that we generate the pattern and test it, but if we find something along the way that doesn't match, we must include the entire string that we already tested, so we don't have to reobserve those characters.
def pattern(inputv):
pattern_end =0
for j in range(pattern_end+1,len(inputv)):
pattern_dex = j%(pattern_end+1)
if(inputv[pattern_dex] != inputv[j]):
pattern_end = j;
continue
if(j == len(inputv)-1):
print pattern_end
return inputv[0:pattern_end+1];
return inputv;
Solution 2
Here is a correct O(n) algorithm. The first for loop is the table building portion of KMP. There are various proofs that it always runs in linear time.
Since this question has 4 previous answers, none of which are O(n) and correct, I heavily tested this solution for both correctness and runtime.
def pattern(inputv):
if not inputv:
return inputv
nxt = [0]*len(inputv)
for i in range(1, len(nxt)):
k = nxt[i - 1]
while True:
if inputv[i] == inputv[k]:
nxt[i] = k + 1
break
elif k == 0:
nxt[i] = 0
break
else:
k = nxt[k - 1]
smallPieceLen = len(inputv) - nxt[-1]
if len(inputv) % smallPieceLen != 0:
return inputv
return inputv[0:smallPieceLen]
Solution 3
This is an example for PHP:
<?php
function getrepeatedstring($string) {
if (strlen($string)<2) return $string;
for($i = 1; $i<strlen($string); $i++) {
if (substr(str_repeat(substr($string, 0, $i),strlen($string)/$i+1), 0, strlen($string))==$string)
return substr($string, 0, $i);
}
return $string;
}
?>
Solution 4
Most easiest one in python:
def pattern(self, s):
ans=(s+s).find(s,1,-1)
return len(pat) if ans == -1 else ans
Related videos on Youtube
jack44
Updated on September 24, 2021Comments
-
jack44 over 2 years
I'm about to write a function which, would return me a shortest period of group of letters which would eventually create the given word.
For example word abkebabkebabkeb is created by repeated abkeb word. I would like to know, how efficiently analyze input word, to get the shortest period of characters creating input word.
-
jack44 almost 13 years@Tony The Tiger, the result (shortest period) does not have to be a real word.
-
-
pimvdb almost 13 yearsThis returns 'abkeb' which should be correct but I'm not sure in what way the OP is asking for 'kebab' rather than 'abkeb'.
-
jack44 almost 13 yearsThis is what I'm looking for. But it runs in O(n). Any ideas if this can be speeded up?
-
mellamokb almost 13 years@jack44: You can't know if you have the shortest cycle until you've examined the entire string. Unless you have other knowledge, like what the largest possible cycle might be. It might be that the last character in the string throws the whole cycle off, you don't know.
-
dfb almost 13 yearsI don't know PHP, but this looks like it's O(n^2).
-
Ishtar almost 13 yearsIs
for pattern_end in range(len(inputv)/2)
necessary? I don't think it is. -
dfb almost 13 years@Ishtar - sorry I'm not following. Do you mean the look of the len()/2 part
-
Ishtar almost 13 yearsI mean, replacing that line with
pattern_end = 0
. -
dfb almost 13 years@Richard86 - String comparison is going to O(n), though, isn't it?
-
AndersTornkvist almost 13 years@jack44 The speed could be improved. One solution is to start at pos
strlen($string)/2
. Then, if it is a match, you would go tostrlen($string)/4
,strlen($string)/8
, and so on, until it is not a match, after that, you would check each char from the current position until match. If it isn't a match at posstrlen($string)/2
, you would check at posstrlen($string)-strlen($string)/2
, if not a match, continue withstrlen($string)-strlen($string)/4
,strlen($string)-strlen($string)/8
and so on until a match, and then, check from one step back. I'll use PHP if no one tries with pseudo. -
AndersTornkvist almost 13 years@jack44 I believe the O(n) solution by @spin is to prefer, and there is no need to improve this answer as it will not be able to be as fast as the O(n) solution is.
-
Eyal Schneider about 11 yearsI'm afraid the algorithm is incorrect. Please consider the input: "BCBDBCBCBDBC". The smallest repeating pattern is "BCBDBC", but the algorithm above will miss it. Also, I think it doesn't deal correctly with the case "HELLOHELL" (where it returns "HELLO" instead of the complete string).
-
Buge over 8 yearsJust looking at the first for loop this is O(n^2), because each .equals() can take n time.
-
Buge over 8 yearsThis is actually O(n^2). That is because you reset
i
to be smaller withi = currentRepeat.length-1;
. So with a 10 character string ling 'aaaaaaaaab' it takes 46 iterations. With a 20 character string it takes 191 iterations. -
davidbuttar over 8 yearsSo is this a solution you have come up with or is this a known algorithm?
-
Buge over 8 yearsWell KMP is a known algorithm. This question was very similar to a homework problem I had, and this is the answer I came up with for the homework. The instructor's solution was a bit different, but also used KMP.
-
Charlie Fish over 7 yearsWould be good if you gave some detail regarding why exactly this works. Providing more detail helps the entire community and helps to get more up votes.
-
Lin Ma over 7 yearsHi Buge, love your solution, and vote up. but confused by this line
smallPieceLen = len(inputv) - nxt[-1]
, andnxt[-1]
means if the whole string does not match, what index we will be used to compare next.smallPieceLen
represents the differences total length of string andnxt[-1]
, how it could be represented as shortest repetitive string? -
greybeard over 7 years@LinMa: (Buge wasn't active lately)
nxt[-1] means if the whole string does not match, what index we will be used to compare next
no (contorted grammar, btw.). It is the index to compare next when all of the pattern matched and you want to find its next occurrence in a longer text.nxt[i] = p
meanspattern[i+1-p:i+1]
equalspattern[0:p]
(& != forp+1
).nxt[-1]
is the index to compare next if the "first" mismatch was "atlen
+1". (In many a presentation/implementation of KMP, there is a special value of -1 at index 0, with the n values as above "shifted to an index higher by one".) -
Lin Ma over 7 yearsHi @greybeard, spend time today to debug your comments and Buge's code carefully, and you are correct, thanks for clarifying the meaning of
nxt[i]=j
. There is one thing I still mis-understand,len(inputv) - nxt[-1]
, this line checks the longest prefix, which matches suffix of the same string,smallPieceLen
is the remaining part length, why iflen(inputv)
could be divided bysmallPieceLen
, then it is shortest repetitive string? Is there a simple prove or some explanation? Is it too aggressive to make the conclusion? -
Lin Ma over 7 yearsMaybe this is a question to both @Buge and greybeard. :)
-
greybeard over 7 years@LinMa: (
both
are notified, anyway) Let me calllen(inputv)
len andnxt[-1]
matchLen. If matchLen < smallPieceLen, the only chance for smallPieceLen to divide len is to be equal to it. If smallPieceLen ≤ matchLen,inputv[0:smallPieceLen]
equalsinputv[smallPieceLen:2*smallPieceLen]
, andk
never got reset (again): inputv is made up of repetitions ofinputv[0:smallPieceLen]
- the divisibility check just ensures that it ends with a full repetition. -
Lin Ma over 7 years@greybeard, nice answer, I think about your reply for more than one day, but not sure how do you get this conclusion
inputv[0:smallPieceLen] equals inputv[smallPieceLen:2*smallPieceLen]
, would you elaborate a bit more? -
Buge over 7 yearsThe way I think about it is
nxt[-1]
is the max amount that the beginning ofinputv
can overlap with the end of itself (without overlapping the entire thing). For example inabcabcabc
nxt[-1]==6
because the last 6 letters are the same as the first 6. I specifically like to think of it as the length of the last characters. So in the example it can be split into 2 parts:abc|abcabc
where the first part has lengthsmallPieceLen
and the second part has lengthnxt[-1]
. Because of the overlap knowledge, we knowabc
overlaps at the beginning withabcabc
as far as the overlap or `abc -
Axel over 6 yearsCan someone translate this answer to a known programming language like java or C? It would be much appreciated to someone like me who doesn't understand this one. Thank you.
-
Axel over 6 yearswhat does: nxt = [0]*len(inputv) do?
-
Buge over 6 yearsIt's python which I consider a known programming language, the same language as the accepted answer.
[0]*len(inputv)
means create a python list (array) with the same length asinputv
and with all elements being0
. Anyway, here it is in Java. C would be a little annoying because of its less flexible strings and arrays. -
Alfonso Fernando Alvarez over 6 years@ward
(defn find-pattern-string [string] (let [pattern "" working-str string] (reduce #(if (not (or (empty? (clojure.string/split string (re-pattern %1))) (empty? (second (clojure.string/split string (re-pattern %1)))))) (str %1 %2) %1) pattern working-str)))
-
Boris Verkhovskiy about 5 yearsThis doesn't work the way I expected if the pattern ends abruptly. For example
[1, 2, 1]
should return[1, 2]
but this code returns[1, 2, 1]
. -
Boris Verkhovskiy about 5 years@EyalSchneider returning "HELLO" for "HELLOHELL" makes perfect sense. Otherwise you could be using the length of the string to determine which patterns are even allowed. You only need to check patterns whose lengths are combinations of the prime factors (and 1) of the length of the input. For example if you get a string that's 101 characters long what's the point of even trying a 5 letter subsequence? It's not going to fit perfectly into 101 characters. If your input's length is 13 there's no point in even checking, no pattern (except a 1 letter pattern) repeated n times can ever add up to 13.
-
Eyal Schneider about 5 years@Boris: The problem is finding the smallest sub-sequence of S such that K>=1 repetitions of it would result in S itself. The input "HELLOHELL" has no repeating subsequence with K>1, so "HELLOHELL" should be returned.
-
Buge about 5 years@Boris I consider
[1, 2]
wrong. The question says "For example wordabkebabkebabkeb
is created by repeatedabkeb
word." But it's not possible to create[1, 2, 1]
by repeating[1, 2]
. You need to repeat it, then chop off the end to get[1, 2, 1]
. But yes, the question is somewhat ambiguous about what the correct answer to[1, 2, 1]
is. -
Anikeev Gennadiy almost 4 yearsI think this would fail for the string "ababcababc"
-
thisisjaymehta over 3 yearsIt will be helpful if you explain what you did
-
Himanshuman almost 2 yearsThis won't work for the string
abaa
where expected output isa
-
Kevin Cruijssen almost 2 years@HimanshuPoddar If I read the question correctly, the full word is always build from multiple repeating substrings without any other characters, so
abab
(2xab
) oraaa
(3xa
) are both valid inputs, butabaa
not (unless you count it as 1xabaa
). -
Himanshuman almost 2 yearsOh yeah, my bad I confused between questions, was reading similar question on the same lines. Thanks for the reply tho