How to find the period of a string
Solution 1
You can do this in linear time and constant additional space by inductively computing the period of each prefix of the string. I can't recall the details (there are several things to get right), but you can find them in Section 13.6 of "Text algorithms" by Crochemore and Rytter under function Per(x)
.
Solution 2
Let me assume that the length of the string n
is at least twice greater than the period p
.
Algorithm
- Let
m
= 1, andS
the whole string - Take
m
= m*2- Find the next occurrence of the substring S[:m]
- Let
k
be the start of the next occurrence - Check if S[:k] is the period
- if not go to 2.
Example
Suppose we have a string
CDCDFBFCDCDFDFCDCDFBFCDCDFDFCDC
For each power m
of 2 we find repetitions of first 2^m
characters. Then we extend this sequence to it's second occurrence. Let's start with 2^1 so CD
.
CDCDFBFCDCDFDFCDCDFBFCDCDFDFCDC
CDCD CDCD CDCD CDCD CD
We don't extend CD
since the next occurrence is just after that. However CD
is not the substring we are looking for so let's take the next power: 2^2 = 4
and substring CDCD
.
CDCDFBFCDCDFDFCDCDFBFCDCDFDFCDC
CDCD CDCD
Now let's extend our string to the first repetition. We get
CDCDFBF
we check if this is periodic. It is not so we go further. We try 2^3 = 8, so CDCDFBFC
CDCDFBFCDCDFDFCDCDFBFCDCDFDFCDC
CDCDFBFC CDCDFBFC
we try to extend and we get
CDCDFBFCDCDFDF
and this indeed is our period.
I expect this to work in O(n log n) with some KMP-like algorithm for checking where a given string appears. Note that some edge cases still should be worked out here.
Intuitively this should work, but my intuition failed once on this problem already so please correct me if I'm wrong. I will try to figure out a proof.
A very nice problem though.
Solution 3
You can build a suffix tree for the entire string in linear time (suffix tree is easy to look up online), and then recursively compute and store the number of suffix tree leaves (occurences of the suffix prefix) N(v) below each internal node v of the suffix tree. Also recursively compute and store the length of each suffix prefix L(v) at each node of the tree. Then, at an internal node v in the tree, the suffix prefix encoded at v is a repeating subsequence that generates your string if N(v) equals the total length of the string divided by L(v).
Related videos on Youtube
Aditya
Updated on October 22, 2022Comments
-
Aditya over 1 year
I take a input from the user and its a string with a certain substring which repeats itself all through the string. I need to output the substring or its length AKA period.
Say
S1 = AAAA // substring is A S2 = ABAB // Substring is AB S3 = ABCAB // Substring is ABC S4 = EFIEFI // Substring is EFI
I could start with a Single char and check if it is same as its next character if it is not, I could do it with two characters then with three and so on. This would be a O(N^2) algo. I was wondering if there is a more elegant solution to this.
-
takendarkk over 10 yearsWhy is S2 substring not AB?
-
Evgeny Kluev over 10 yearsFor string "abababxxxxabababxxxx" tortoise and hare gives you period 2 while actual period is 10.
-
-
takendarkk over 10 yearsCould you show a code example of how this would work for a couple inputs?
-
Aditya over 10 yearsDude. Read the paragraph of text below the example in my question :|
-
Mehmet Sedat Güngör over 10 yearsOh I didn't see that. But since you said it has to include a cycle, that method's complexity will be O(N) because you are just comparing first character with the whole string in the worst case scenario. @csmckelvey Pseudocode will be like this:
char first = input[0]; String repeatingString = first; int i = 1; char nextChar = input[i]; while(first!=nextChar) { repeatingString += nextChar; i++; nextChar = input[i]; }
-
Aditya over 10 yearsIts O(N^2) in the worst case cause it would first check with 1 character then 2 characters than 3 .. till N/2 so it is 1+2+3+..N/2 => (N/2)(N/2+1)/2 Dropping all constants you have N^2
-
Mehmet Sedat Güngör over 10 years@Aditya No, with the above method, it will check first character with the second one, then third one etc. So it will be 1+1+1.. and in the worst case scenerio, it will be N(length of input) times character comparing. You are not comparing A with AB, and then A with ABC. You are comparing A with B and then A with C. So it will be O(N). Just look at above pseudocode I provided. It will loop maximumly for input string's length.
-
Łukasz Kidziński over 10 yearsSure, the pseudocode is O(n) but it returns just a string of repeated first character.
-
tmyklebu over 10 yearsThe point of KMP preprocessing is that it computes suffix-prefix matches. You can read the period of the string off from the last entry of the failure table.
-
tmyklebu over 10 yearsYou certainly can do this with a suffix tree, but that will be orders of magnitude slower than doing KMP preprocessing.
-
Łukasz Kidziński over 10 yearsWell, here I meant KMP as an O(n) algorithm for finding a pattern in a string.
-
Łukasz Kidziński over 10 years+1 for a good reference. The algorithm is really there, page 340. Doesn't seem like an interview solution though :)
-
user2566092 over 10 years@tmyklebu How do you figure that the constants for suffix trees are orders of magnitude larger than other algorithms? That means like constants of 1000. A suffix tree certainly does not have hidden constants of 1000.
-
Mehmet Sedat Güngör over 10 years@ŁukaszKidziński No, next characters until the match is found will be appended to String. So, it will have repeated cycle String value in it. Look at the code, it appends the nextChar of input to String until a match found.
-
tmyklebu over 10 yearsI haven't actually seen an implementation that doesn't. You're welcome to link me to your favourite implementation and I can compare against a good implementation of the linear-time, constant-space algorithm I referenced, but suffix trees being a factor of 1000 slower really wouldn't surprise me.
-
Bernhard Barker over 10 yearsThe pseudo-code doesn't work, consider
ababacababac
(the correct substring isababac
, notab
). And if you modify it so that it does work, it will no longer be O(n). -
Mehmet Sedat Güngör over 10 years@Dukeling I repeat, this is not a general solution for finding a substring that repeats. OP mentioned that "every character in the input string is part of the repeating substring" and gave some examples, so that was a specific solution. Tortoise and Hare will also give wrong output for that case. So it has to be bigger than O(N), no different thoughts.
-
Peter Taylor over 6 yearsA reference to a list of errata would also help, because there are at least three errors in the subroutine
Next_MS
and at least two more inPer
. -
R.. GitHub STOP HELPING ICE over 5 years@PeterTaylor: If you're aware of those errors, can you clarify what they are? Leaving a comment on an old answer to an old question saying that the answer has errors but without any clue what those errors might be is mildly helpful but also extremely frustrating.
-
Peter Taylor over 5 years@R, looking briefly at the text, in the signature of
Next_MS
I suspectx[1…m]
should bex[1...n]
(sincem
is not used andn
is used without definition);then end
should almost certainly bethen begin
;ms := j; j := ms+1;
looks suspicious; inPer
there's athen
which should probably bethen begin
and anelse end
which should probably beelse begin
. But I'm not certain that those are the errors I saw 15 months ago, and an official errata list would in any case be far more valuable than my opinion.