How to find a duplicate element in an array of shuffled consecutive integers?

arrays algorithm duplicates

65,472

Solution 1

Just add them all up, and subtract the total you would expect if only 1001 numbers were used from that.

Eg:

Input: 1,2,3,2,4 => 12
Expected: 1,2,3,4 => 10

Input - Expected => 2

Solution 2

Update 2: Some people think that using XOR to find the duplicate number is a hack or trick. To which my official response is: "I am not looking for a duplicate number, I am looking for a duplicate pattern in an array of bit sets. And XOR is definitely suited better than ADD to manipulate bit sets". :-)

Update: Just for fun before I go to bed, here's "one-line" alternative solution that requires zero additional storage (not even a loop counter), touches each array element only once, is non-destructive and does not scale at all :-)

printf("Answer : %d\n",
           array[0] ^
           array[1] ^
           array[2] ^
           // continue typing...
           array[999] ^
           array[1000] ^
           1 ^
           2 ^
           // continue typing...
           999^
           1000
      );

Note that the compiler will actually calculate the second half of that expression at compile time, so the "algorithm" will execute in exactly 1002 operations.

And if the array element values are know at compile time as well, the compiler will optimize the whole statement to a constant. :-)

Original solution: Which does not meet the strict requirements of the questions, even though it works to find the correct answer. It uses one additional integer to keep the loop counter, and it accesses each array element three times - twice to read it and write it at the current iteration and once to read it for the next iteration.

~~Well, you need at least one additional variable (or a CPU register) to store the index of the current element as you go through the array.~~

Aside from that one though, here's a destructive algorithm that can safely scale for any N up to MAX_INT.

for (int i = 1; i < 1001; i++)
{
   array[i] = array[i] ^ array[i-1] ^ i;
}

printf("Answer : %d\n", array[1000]);

I will leave the exercise of figuring out why this works to you, with a simple hint :-):

a ^ a = 0
0 ^ a = a

Solution 3

A non destructive version of solution by Franci Penov.

This can be done by making use of the XOR operator.

Lets say we have an array of size 5: 4, 3, 1, 2, 2
Which are at the index: 0, 1, 2, 3, 4

Now do an XOR of all the elements and all the indices. We get 2, which is the duplicate element. This happens because, 0 plays no role in the XORing. The remaining n-1 indices pair with same n-1 elements in the array and the only unpaired element in the array will be the duplicate.

int i;
int dupe = 0;
for(i = 0; i < N; i++) {
    dupe = dupe ^ arr[i] ^ i;
}
// dupe has the duplicate.

The best feature of this solution is that it does not suffer from overflow problems that is seen in the addition based solution.

Since this is an interview question, it would be best to start with the addition based solution, identify the overflow limitation and then give the XOR based solution :)

This makes use of an additional variable so does not meet the requirements in the question completely.

Solution 4

Add all the numbers together. The final sum will be the 1+2+...+1000+duplicate number.

Solution 5

To paraphrase Francis Penov's solution.

The (usual) problem is: given an array of integers of arbitrary length that contain only elements repeated an even times of times except for one value which is repeated an odd times of times, find out this value.

The solution is:

acc = 0
for i in array: acc = acc ^ i

Your current problem is an adaptation. The trick is that you are to find the element that is repeated twice so you need to adapt solution to compensate for this quirk.

acc = 0
for i in len(array): acc = acc ^ i ^ array[i]

Which is what Francis' solution does in the end, although it destroys the whole array (by the way, it could only destroy the first or last element...)

But since you need extra-storage for the index, I think you'll be forgiven if you also use an extra integer... The restriction is most probably because they want to prevent you from using an array.

It would have been phrased more accurately if they had required O(1) space (1000 can be seen as N since it's arbitrary here).

View more solutions

65,472

Author by

SysAdmin

Updated on July 05, 2022

Comments

SysAdmin almost 2 years

I recently came across a question somewhere:

Suppose you have an array of 1001 integers. The integers are in random order, but you know each of the integers is between 1 and 1000 (inclusive). In addition, each number appears only once in the array, except for one number, which occurs twice. Assume that you can access each element of the array only once. Describe an algorithm to find the repeated number. If you used auxiliary storage in your algorithm, can you find an algorithm that does not require it?

What I am interested in to know is the second part, i.e., without using auxiliary storage. Do you have any idea?
Brian Rasmussen about 14 years

Wouldn't that require auxiliary storage?
leppie about 14 years

@Brian Rasmussen: where is the extra storage?
Brian Rasmussen about 14 years

@leppie: To hold the calculated sum, but honestly I don't know exactly what the OP meant by extra storage. In any case, I like your answer.
Michael Aaron Safyan about 14 years

@Brian, the interviewer probably meant "don't use a hash table or an array"... I'm pretty sure O(1) storage, especially a single variable, would be satisfactory.
Justin Ardini about 14 years

I'm guessing auxilliary storage would be storage that takes more than O(1) space.
leppie about 14 years

You are assuming the array is sorted. Bad assumption.
Brian Rasmussen about 14 years

@Michael: Thanks, that makes sense. Please disregard by question then.
SysAdmin about 14 years

the methord works perfectly fine. but the example should have been something like ( 1,3,2,4,2=>12 ) - (1+2+3+4 => 10) = 2
N 1.1 about 14 years

@leppie: how come? I didn't assume anything. And it actually uses any extra space like other answers suggest.
Franci Penov about 14 years

This solution does not scale for bigger arrays. :-)
Brian Rasmussen about 14 years

@Franci Penov: I am not sure interview questions are supposed to scale :)
Dennis Zickefoose about 14 years

Although I do have issue with the premise. It requires an extra two ints.
N 1.1 about 14 years

@Dennis: well loop variable has to be there, and length to make it generic.
Dennis Zickefoose about 14 years

This requires linear stack space, so that is definitely cheating.
cobbal about 14 years

Throw in another argument and you can tail call optimize it.
Matthieu M. about 14 years

A non-destructive method would be to maintain an accumulator on the side... would also make it more readable I think.
Dennis Zickefoose about 14 years

@nvl: Even with academic exercises like this, a destructive algorithm that saves a single variable isn't particularly beneficial.
Franci Penov about 14 years

@Matthiey M. - but a non-destructive solution would require additional storage and thus violate the requirements of the problem.
SysAdmin about 14 years

"I am not sure interview questions are supposed to scale"...but I am almost certain it would be the next question.
Dennis Zickefoose about 14 years

And your solution requires additional storage to hold the loop counter, plus possibly some intermediate values depending on how all that gets compiled. However, chosing a destructive algorithm does prevent overflow issues without introducing a separate type to hold the sum, which gives it an actual purpose.
Franci Penov about 14 years

@Dennis Zickefoose - I am not arguing that a non-destructive solution with an additional integer variable is not better. :-) but it does violate the problem requirement, that's why I choose the destructive algorithm. as for the loop counter - there's no way to avoid this one, and it is implicitly allowed because the problem states that the code is allowed to iterate over the array once, which is not possible without loop counter.
leppie about 14 years

@Aistina & @SysAdmin: Thanks, didn't notice that ;P
Franci Penov about 14 years

@leppie - scale in terms of array size that can be safely processed without overflow. :-)
leppie about 14 years

@Franci Penov: It's an algorithm, the size of integers and arrays are unbound. Most languages could deal with big ints, the array size might be a problem, but you would have a problem before you even start ;P
Unreason about 14 years

When you look at XOR of 1 to 1000, XOR of the whole array and compare this procedure with adding 1 to 1000 and comparing with the sum of the whole array you notice similarity - addition and XOR operation are related in binary. The only difference that XOR is more natural here because we know that the final difference can not be bigger then 1000.
Robert Gowland about 14 years

Not that this isn't a brilliant solution, but doesn't your first solution, the loop, access each member of the array twice?
Franci Penov about 14 years

@Robert - technically, it does. Hence the second solution. :-)
fa. about 14 years

It still have additional storage in the form of temporary values, no ?
Franci Penov about 14 years

@Pavel Shved - there's no trick to XOR, it's a mathematical operation with well known properties, just as addition, multiplication and others.
Franci Penov about 14 years

@fa - I don't see any temporary values. For all I know, the compiler I've been given targets a CPU that has special instruction that does XOR on 1002 values at once. :-)
Franci Penov about 14 years

@Pavel - also, you and I look at the problem in different ways - for I am not searching for duplicate number, I am searching for a duplicate pattern in sets of flags. when you state the problem in this way, using addition now becomes the "dirty trick" :-)
P Shved about 14 years

@Franci: Wow, the stuff about flags makes sense. Perhaps, you might want to edit it into your answer. Then I can change the vote :-)
jfs about 14 years

I've posted Python one-liner based on your answer stackoverflow.com/questions/2605766/…
Matthieu M. about 14 years

+1, well done: even though it's not a code golf, using python's built-in loops is faster :)
jfs almost 14 years

sum-based algorithm requires O(log(n)) additional storage to hold the sum that is worse than O(1) for xor-based solution.
Greg A. Woods over 11 years

Simple sums will actually work work. Integer overflow is not a problem, provided the variable tallying the sum is unsigned.
Prabhjot about 10 years

Frankly, I'm not getting these XOR based solutions. Basically, we're trying to match the "index" with the value of the element. In case of match, the result will be zero and for repeated value, the xor result will be non zero. For a simple array --> {1,2,2} we will xor 1(element value)^1(index)^0 (previous xor result) --> 0; 2^2^0 --> 0; 3^2^0 --> 1. Here 1 is the final result value as per XOR solutions. I don't see how this is valid answer unless I'm missing something very obvious.
Raman Singh almost 10 years

@codaddict I think the loop should start with i initialized to 1.
legends2k over 9 years

@codaddict +1 for the clear illustration and mentioning about the overflow (also for being non-destructive). The same will work, with a few changes, even if the integers have an offset, say { 1043, 1042, 1044, 1042 } by XOR-ing with { 0, 1042, 1043, 1044 }.
Paul Hankin about 9 years

@J.F.Sebastian why does addition need more storage than xor? In both cases you need an accumulator with O(log(n)) bits.
jfs about 9 years

@Anonymous: you are right: the sum requires only twice as much bits as xor and constant factors do not matter for Big O. I don't remember what I meant: a very generous interpretation could be: one number (for the result) you have for free and using xor it is all you need but with the sum you need one more additional number (the sum is square) ~log n bits.
Paul Hankin about 9 years

@J.F.Sebastian in both cases you need an accumulator with just enough bits to contain the result. With addition you can just throw away the overflow bits (and perform subtraction with two's complement addition).
jfs about 9 years

@Anonymous: it is interesting. Though "sum with specific overflow behavior" is sufficiently different from just "sum" (there is no language tag on the question: overflow behaviour is unknown -- I would assume either infinite precision integers (such as in Python) or just "twice as wide" fixed integer for the sum). Could you post it as an answer with more details how the overflow works in this case?