Why increase pointer by two while finding loop in linked list, why not 3,4,5?

algorithm data-structures linked-list cycle floyd-cycle-finding

23,195

Solution 1

From a correctness perspective, there is no reason that you need to use the number two. Any choice of step size will work (except for one, of course). However, choosing a step of size two maximizes efficiency.

To see this, let's take a look at why Floyd's algorithm works in the first place. The idea is to think about the sequence x₀, x₁, x₂, ..., x_n, ... of the elements of the linked list that you'll visit if you start at the beginning of the list and then keep on walking down it until you reach the end. If the list does not contain a cycle, then all these values are distinct. If it does contain a cycle, though, then this sequence will repeat endlessly.

Here's the theorem that makes Floyd's algorithm work:

The linked list contains a cycle if and only if there is a positive integer j such that for any positive integer k, x_j = x_jk.

Let's go prove this; it's not that hard. For the "if" case, if such a j exists, pick k = 2. Then we have that for some positive j, x_j = x_2j and j ≠ 2j, and so the list contains a cycle.

For the other direction, assume that the list contains a cycle of length l starting at position s. Let j be the smallest multiple of l greater than s. Then for any k, if we consider x_j and x_jk, since j is a multiple of the loop length, we can think of x_jk as the element formed by starting at position j in the list, then taking j steps k-1 times. But each of these times you take j steps, you end up right back where you started in the list because j is a multiple of the loop length. Consequently, x_j = x_jk.

This proof guarantees you that if you take any constant number of steps on each iteration, you will indeed hit the slow pointer. More precisely, if you're taking k steps on each iteration, then you will eventually find the points x_j and x_kj and will detect the cycle. Intuitively, people tend to pick k = 2 to minimize the runtime, since you take the fewest number of steps on each iteration.

We can analyze the runtime more formally as follows. If the list does not contain a cycle, then the fast pointer will hit the end of the list after n steps for O(n) time, where n is the number of elements in the list. Otherwise, the two pointers will meet after the slow pointer has taken j steps. Remember that j is the smallest multiple of l greater than s. If s ≤ l, then j = l; otherwise if s > l, then j will be at most 2s, and so the value of j is O(s + l). Since l and s can be no greater than the number of elements in the list, this means than j = O(n). However, after the slow pointer has taken j steps, the fast pointer will have taken k steps for each of the j steps taken by the slower pointer so it will have taken O(kj) steps. Since j = O(n), the net runtime is at most O(nk). Notice that this says that the more steps we take with the fast pointer, the longer the algorithm takes to finish (though only proportionally so). Picking k = 2 thus minimizes the overall runtime of the algorithm.

Hope this helps!

Solution 2

Let us suppose the length of the list which does not contain the loop be s, length of the loop be t and the ratio of fast_pointer_speed to slow_pointer_speed be k.

Let the two pointers meet at a distance j from the start of the loop.

So, the distance slow pointer travels = s + j. Distance the fast pointer travels = s + j + m * t (where m is the number of times the fast pointer has completed the loop). But, the fast pointer would also have traveled a distance k * (s + j) (k times the distance of the slow pointer).

Therefore, we get k * (s + j) = s + j + m * t.

s + j = (m / k-1)t.

Hence, from the above equation, length the slow pointer travels is an integer multiple of the loop length.

For greatest efficiency , (m / k-1) = 1 (the slow pointer shouldn't have traveled the loop more than once.)

therefore , m = k - 1 => k = m + 1

Since m is the no.of times the fast pointer has completed the loop , m >= 1 . For greatest efficiency , m = 1.

therefore k = 2.

if we take a value of k > 2 , more the distance the two pointers would have to travel.

Hope the above explanation helps.

Solution 3

Consider a cycle of size L, meaning at the kth element is where the loop is: x_k -> x_k+1 -> ... -> x_k+L-1 -> x_k. Suppose one pointer is run at rate r₁=1 and the other at r₂. When the first pointer reaches x_k the second pointer will already be in the loop at some element x_k+s where 0 <= s < L. After m further pointer increments the first pointer is at x_{k+(m mod L)} and the second pointer is at x_{k+((m*r₂+s) mod L)}. Therefore the condition that the two pointers collide can be phrased as the existence of an m satisfying the congruence

m = m*r₂ + s (mod L)

This can be simplified with the following steps

m(1-r₂) = s (mod L)

m(L+1-r₂) = s (mod L)

This is of the form of a linear congruence. It has a solution m if s is divisible by gcd(L+1-r₂,L). This will certainly be the case if gcd(L+1-r₂,L)=1. If r₂=2 then gcd(L+1-r₂,L)=gcd(L-1,L)=1 and a solution m always exists.

Thus r₂=2 has the good property that for any cycle size L, it satisfies gcd(L+1-r₂,L)=1 and thus guarantees that the pointers will eventually collide even if the two pointers start at different locations. Other values of r₂ do not have this property.

Solution 4

If the fast pointer moves 3 steps and slow pointer at 1 step, it is not guaranteed for both pointers to meet in cycles containing even number of nodes. If the slow pointer moved at 2 steps, however, the meeting would be guaranteed.

In general, if the hare moves at H steps, and tortoise moves at T steps, you are guaranteed to meet in a cycle iff H = T + 1.

Consider the hare moving relative to the tortoise.

Hare's speed relative to the tortoise is H - T nodes per iteration.
Given a cycle of length N =(H - T) * k, where k is any positive integer, the hare would skip every H - T - 1 nodes (again, relative to the tortoise), and it would be impossible to for them to meet if the tortoise was in any of those nodes.
The only possibility where a meeting is guaranteed is when H - T - 1 = 0.

Hence, increasing the fast pointer by x is allowed, as long as the slow pointer is increased by x - 1.

Solution 5

Here is a intuitive non-mathematical way to understand this:

If the fast pointer runs off the end of the list obviously there is no cycle.

Ignore the initial part where the pointers are in the initial non-cycle part of the list, we just need to get them into the cycle. It doesn't matter where in the cycle the fast pointer is when the slow pointer finally reaches the cycle.

Once they are both in the cycle, they are circling the cycle but at different points. Imagine if they were both moving by one each time. Then they would be circling the cycle but staying the same distance apart. In other words, making the same loop but out of phase. Now by moving the fast pointer by two each step they are changing their phase with each other; Decreasing their distance apart by one each step. The fast pointer will catch up to the slow pointer and we can detect the loop.

To prove this is true, that they will meet each other and the fast pointer will not somehow overtake and skip over the slow pointer just hand simulate what happens when the fast pointer is three steps behind the slow, then simulate what happens when the fast pointer is two steps behind the slow, then when the fast pointer is just one step behind the slow pointer. In every case they meet at the same node. Any larger distance will eventually become a distance of three, two or one.

View more solutions

23,195

Author by

GG.

Undergraduate in computer science from ITBHU.

Updated on January 20, 2022

Comments

GG. over 2 years

I had a look at question already which talk about algorithm to find loop in a linked list. I have read Floyd's cycle-finding algorithm solution, mentioned at lot of places that we have to take two pointers. One pointer( slower/tortoise ) is increased by one and other pointer( faster/hare ) is increased by 2. When they are equal we find the loop and if faster pointer reaches null there is no loop in the linked list.

Now my question is why we increase faster pointer by 2. Why not something else? Increasing by 2 is necessary or we can increase it by X to get the result. Is it necessary that we will find a loop if we increment faster pointer by 2 or there can be the case where we need to increment by 3 or 5 or x.
Mike Tunnicliffe about 13 years

Doesn't your proof presuppose that you know the length of the cycle that you are trying to find, so that you can choose an appropriate speed for the hare. Whilst this will produce a hare that will always work for that length of cycle, it would not be guaranteed to work for a cycle of a different length (unless you chose speed 2).
Nikita Rybak about 13 years

@fd This algorithm shows that for every k there's a j such that after j steps both animals meet.
templatetypedef about 13 years

@fd- The proof itself doesn't assume that you know the cycle length; it just says that for any cycle length and cycle starting position there is some position j that has the desired property. If you think about how the modified tortise/hare algorithm would work, it would start advancing the two pointers at rates 1 and k. After taking j steps, the two pointers would be at positions j and jk, which are coincident. You don't need to know what j is in order to reach it. Does this make sense?
Nikita Rybak about 13 years

+1 Good job. It's worth noting that although you can jump by 100 pointers, there's little point in that: jumping by 100 pointers requires 100 operations in linked list.
templatetypedef about 13 years

@Nikita Rybak- That's true. The runtime of this algorithm is proportional to the step size, which is why we usually pick 2.
Lasse V. Karlsen about 13 years

If k=3, and the sequence is 1-2-3-4-5-6-7-8-3 (8 loops back to 3), if one starts at 1 (with a step of 1), and the other at 2 (with a step of 3), can you show how this would detect a cycle then?
templatetypedef about 13 years

@Lasse V. Karlsen- it wouldn't, I think, but that's because you're not starting the pointers at the same position. The algorithm only works if the two pointers begin at the first spot in the list.
Lasse V. Karlsen about 13 years

But if they start at the same point in the list, wouldn't the algorithm short-circuit before it has even begun, isn't the whole point that if "hare == tortoise", then you have found a cycle? Note, I'm not saying you're wrong, I'm still looking for the explanation of this that I can understand.
Lasse V. Karlsen about 13 years

And again, note that the OP has asked why the algorithm works. I can show, by experimentation, that it works for step=3 when they start at the same spot (and the start isn't counted as a "cycle found" incident), but I'm no closer to actually understanding how the algorithm works.
templatetypedef about 13 years

@Lasse V. Karlsen- in the algorithm, you start the pointers off at the same location, then continuously advance them by their respective amounts and only then check whether they've met, so the algorithm won't short-circuit. I've tried to show by the above proof why the algorithm works. The idea is that if there is a loop, there ilk be some point in which advancing the pointers at different rates will hit the same element. The algorithm works by advancing the pointers either until this element is found or until the fast pointer hits the end of the list, precluding it's existence.
Mike Tunnicliffe about 13 years

@templatetypedef +1 Got it. The hole I thought I saw was closed by your comment on my answer (which I deleted 'cos it was wrong wrong wrong ;) ).
templatetypedef about 13 years

To whoever downvoted- can you explain what's wrong with this answer?
ninjagecko about 12 years

Also going to -1, this is unfortunately somewhat incorrect; see stackoverflow.com/a/9870363/711085 for counterexamples. It would only be correct if you mention that you have to modify Floyd's algorithm such that you check every for equality every time you follow a pointer, not every time you skip ahead by k. You might intuitively be doing this in your head (you mentioned "continuously" in your comments, which I take to mean increase the indices "like floats"... as odd as that sounds), but it's not hinted at in the answer. I'd be happy to upvote if this was clarified.
templatetypedef about 12 years

@ninjagecko- I believe that your counterexamples are incorrect because you're positioning the tortoise and the hare incorrectly at the start of the algorithm. The tortoise and hare start at positions 1 and 2 because we're using step sizes of 1 and 2. If you change the step sizes to something else (say, 2 and 4), then the positions of the tortoise and hare at the start of the algorithm should be 2 and 4, not 1 and 2. With this change, the algorithm still continues to work correctly even if the step sizes are coprime to one another. Does that make sense?
ninjagecko about 12 years

@templatetypedef: Yes I'm well aware that the starting position must be the same; the counterexamples were snapshots in time. However it seems that you are indeed correct; these snapshots were impossible. Sorry, upvoting.
ninjagecko about 12 years

To elaborate, I can generate a proof of why any number will work: If you consider relative speeds, we reduce the problem to the hare running in a circle, accidentally hopping over a fence at position=0. This can only happen if the relative speed is coprime to the length AND the hare is hopping on a separate coset (e.g. algorithm would not work if hare started at position tortoise+1 AND relative speed was coprime w/length). But the hare must be in the same coset (visual proof: coil the non-loop around the loop). QED.
Ankit Roy over 11 years

Beautiful explanation. After staring at "Let j be the smallest multiple of l greater than s" for a minute, it clicked: this means that if you take j steps from the start, you're inside the loop (since j > s), and if you take another j steps from there you'll wind up back in the same place (since j is a multiple of l). So the same must hold for any multiple of j steps. And although we don't know what j is a priori, we know it must exist, and we effectively ask "Is this j?" after each move, so we can't miss it.
Ankit Roy over 11 years

Very interesting that a double-speed hare has this additional "start-anywhere" property. I need to understand modular arithmetic better (I understood everything except for "It has a solution m if s is divisible by gcd(L+1-r2,L)")
Ankit Roy over 11 years

Even if the loop length is L, it's OK to increment the fast pointer by L+1. It will wind up in the same place each time, but that's not a problem because the slow pointer will catch it.
ajayg over 11 years

@j_random_hacker .... how can slow pointer ever catch the fast pointer ?? the difference between the two will always be constant ... since it will be like both are incremented by 1.
ivymike almost 10 years

Can't help but comment on this old thread :) They both catch each other the same way seconds and minutes hands have to eventually meet each other on a clock face.
ultimate cause about 8 years

@Sumit: If you take ratio of speeds of pointers is not it possible that slower one also may have traversed the loop more than once hence the distance traveled by slower may not be just s+j. Lets say slower one moves 2 steps once and faster one moves 5 steps. Am I missing something?
Sumit Das about 8 years

Yes . That's true . If you take the ratio of 2 , then the shorter pointer does not need to traverse the loop more than once and hence is optimal . That's what I tried to prove . Other ratios , as you pointed out , are not optimal and the shorter pointer may traverse the loop more than once.
gaurav jain almost 8 years

Can you tell why in this equation: s + j = (m / k-1)t , (m/k-1) should necessarily be an integer?
septerr almost 7 years

Thank you, this finally clarified the algorithm for me.
Quinn Mortimer about 4 years

While this works as an explanation of cycle detection, it only addresses the question of "Why 2?" in comparison to 1, not 3, 4, 5, etc. To that point, while this isn't a bad answer I don't think it's actually answering the question.
Andes Lam about 2 years

The sequence you are proposing starts with a 0, which means that if the cycle starts right at 0, $jk=0\forall k\in N$, which means both fast and slow ptr staying at the same place, but a cycle can indeed start at the beginning. The way I see it is that modifying to 1-indexed can solve the problem. What do you think?