Algorithm to determine if array contains n...n+m?

30,196

Solution 1

Under the assumption numbers less than one are not allowed and there are no duplicates, there is a simple summation identity for this - the sum of numbers from 1 to m in increments of 1 is (m * (m + 1)) / 2. You can then sum the array and use this identity.

You can find out if there is a dupe under the above guarantees, plus the guarantee no number is above m or less than n (which can be checked in O(N))

The idea in pseudo-code:
0) Start at N = 0
1) Take the N-th element in the list.
2) If it is not in the right place if the list had been sorted, check where it should be.
3) If the place where it should be already has the same number, you have a dupe - RETURN TRUE
4) Otherwise, swap the numbers (to put the first number in the right place).
5) With the number you just swapped with, is it in the right place?
6) If no, go back to step two.
7) Otherwise, start at step one with N = N + 1. If this would be past the end of the list, you have no dupes.

And, yes, that runs in O(N) although it may look like O(N ^ 2)

Note to everyone (stuff collected from comments)

This solution works under the assumption you can modify the array, then uses in-place Radix sort (which achieves O(N) speed).

Other mathy-solutions have been put forth, but I'm not sure any of them have been proved. There are a bunch of sums that might be useful, but most of them run into a blowup in the number of bits required to represent the sum, which will violate the constant extra space guarantee. I also don't know if any of them are capable of producing a distinct number for a given set of numbers. I think a sum of squares might work, which has a known formula to compute it (see Wolfram's)

New insight (well, more of musings that don't help solve it but are interesting and I'm going to bed):

So, it has been mentioned to maybe use sum + sum of squares. No one knew if this worked or not, and I realized that it only becomes an issue when (x + y) = (n + m), such as the fact 2 + 2 = 1 + 3. Squares also have this issue thanks to Pythagorean triples (so 3^2 + 4^2 + 25^2 == 5^2 + 7^2 + 24^2, and the sum of squares doesn't work). If we use Fermat's last theorem, we know this can't happen for n^3. But we also don't know if there is no x + y + z = n for this (unless we do and I don't know it). So no guarantee this, too, doesn't break - and if we continue down this path we quickly run out of bits.

In my glee, however, I forgot to note that you can break the sum of squares, but in doing so you create a normal sum that isn't valid. I don't think you can do both, but, as has been noted, we don't have a proof either way.


I must say, finding counterexamples is sometimes a lot easier than proving things! Consider the following sequences, all of which have a sum of 28 and a sum of squares of 140:

[1, 2, 3, 4, 5, 6, 7]
[1, 1, 4, 5, 5, 6, 6] 
[2, 2, 3, 3, 4, 7, 7]

I could not find any such examples of length 6 or less. If you want an example that has the proper min and max values too, try this one of length 8:

[1, 3, 3, 4, 4, 5, 8, 8]

Simpler approach (modifying hazzen's idea):

An integer array of length m contains all the numbers from n to n+m-1 exactly once iff

  • every array element is between n and n+m-1
  • there are no duplicates

(Reason: there are only m values in the given integer range, so if the array contains m unique values in this range, it must contain every one of them once)

If you are allowed to modify the array, you can check both in one pass through the list with a modified version of hazzen's algorithm idea (there is no need to do any summation):

  • For all array indexes i from 0 to m-1 do
    1. If array[i] < n or array[i] >= n+m => RETURN FALSE ("value out of range found")
    2. Calculate j = array[i] - n (this is the 0-based position of array[i] in a sorted array with values from n to n+m-1)
    3. While j is not equal to i
      1. If list[i] is equal to list[j] => RETURN FALSE ("duplicate found")
      2. Swap list[i] with list[j]
      3. Recalculate j = array[i] - n
  • RETURN TRUE

I'm not sure if the modification of the original array counts against the maximum allowed additional space of O(1), but if it doesn't this should be the solution the original poster wanted.

Solution 2

By working with a[i] % a.length instead of a[i] you reduce the problem to needing to determine that you've got the numbers 0 to a.length - 1.

We take this observation for granted and try to check if the array contains [0,m).

Find the first node that's not in its correct position, e.g.

0 1 2 3 7 5 6 8 4 ;     the original dataset (after the renaming we discussed)
        ^
        `---this is position 4 and the 7 shouldn't be here

Swap that number into where it should be. i.e. swap the 7 with the 8:

0 1 2 3 8 5 6 7 4 ; 
        |     `--------- 7 is in the right place.
        `--------------- this is now the 'current' position

Now we repeat this. Looking again at our current position we ask:

"is this the correct number for here?"

  • If not, we swap it into its correct place.
  • If it is in the right place, we move right and do this again.

Following this rule again, we get:

0 1 2 3 4 5 6 7 8 ;     4 and 8 were just swapped

This will gradually build up the list correctly from left to right, and each number will be moved at most once, and hence this is O(n).

If there are dupes, we'll notice it as soon is there is an attempt to swap a number backwards in the list.

Solution 3

Any one-pass algorithm requires Omega(n) bits of storage.

Suppose to the contrary that there exists a one-pass algorithm that uses o(n) bits. Because it makes only one pass, it must summarize the first n/2 values in o(n) space. Since there are C(n,n/2) = 2^Theta(n) possible sets of n/2 values drawn from S = {1,...,n}, there exist two distinct sets A and B of n/2 values such that the state of memory is the same after both. If A' = S \ A is the "correct" set of values to complement A, then the algorithm cannot possibly answer correctly for the inputs

A A' - yes

B A' - no

since it cannot distinguish the first case from the second.

Q.E.D.

Solution 4

Why do the other solutions use a summation of every value? I think this is risky, because when you add together O(n) items into one number, you're technically using more than O(1) space.

Simpler method:

Step 1, figure out if there are any duplicates. I'm not sure if this is possible in O(1) space. Anyway, return false if there are duplicates.

Step 2, iterate through the list, keep track of the lowest and highest items.

Step 3, Does (highest - lowest) equal m ? If so, return true.

Solution 5

boolean determineContinuousArray(int *arr, int len)
{
    // Suppose the array is like below:
    //int arr[10] = {7,11,14,9,8,100,12,5,13,6};
    //int len = sizeof(arr)/sizeof(int);

    int n = arr[0];

    int *result = new int[len];
    for(int i=0; i< len; i++)
            result[i] = -1;
    for (int i=0; i < len; i++)
    {
            int cur = arr[i];
            int hold ;
            if ( arr[i] < n){
                    n = arr[i];
            }
            while(true){
                    if ( cur - n >= len){
                            cout << "array index out of range: meaning this is not a valid array" << endl;
                            return false;
                    }
                    else if ( result[cur - n] != cur){
                            hold = result[cur - n];
                            result[cur - n] = cur;
                            if (hold == -1) break;
                            cur = hold;

                    }else{
                            cout << "found duplicate number " << cur << endl;
                            return false;
                    }

            }
    }
    cout << "this is a valid array" << endl;
    for(int j=0 ; j< len; j++)
            cout << result[j] << "," ;
    cout << endl;
    return true;
}
Share:
30,196

Related videos on Youtube

Kyle Cronin
Author by

Kyle Cronin

Updated on August 10, 2021

Comments

  • Kyle Cronin
    Kyle Cronin almost 3 years

    I saw this question on Reddit, and there were no positive solutions presented, and I thought it would be a perfect question to ask here. This was in a thread about interview questions:

    Write a method that takes an int array of size m, and returns (True/False) if the array consists of the numbers n...n+m-1, all numbers in that range and only numbers in that range. The array is not guaranteed to be sorted. (For instance, {2,3,4} would return true. {1,3,1} would return false, {1,2,4} would return false.

    The problem I had with this one is that my interviewer kept asking me to optimize (faster O(n), less memory, etc), to the point where he claimed you could do it in one pass of the array using a constant amount of memory. Never figured that one out.

    Along with your solutions please indicate if they assume that the array contains unique items. Also indicate if your solution assumes the sequence starts at 1. (I've modified the question slightly to allow cases where it goes 2, 3, 4...)

    edit: I am now of the opinion that there does not exist a linear in time and constant in space algorithm that handles duplicates. Can anyone verify this?

    The duplicate problem boils down to testing to see if the array contains duplicates in O(n) time, O(1) space. If this can be done you can simply test first and if there are no duplicates run the algorithms posted. So can you test for dupes in O(n) time O(1) space?

    • Mark Ransom
      Mark Ransom over 15 years
      Did you really mean an array of size m (not n)? Seems like it from your example.
    • Kent Fredric
      Kent Fredric over 15 years
      heres a problem array for the challengers: [1,1,4,4,5]. should = false. summation thinks its fine.
    • akien_ghie09
      akien_ghie09 over 15 years
      For the given problem, you could make a case that it could be done in O(1) space, since int array was specified. I have submitted a possible solution in that case. However, for an unbounded input, I don't believe O(1) space is possible. (Though I do think we could do better than O(n) space)
    • jfs
      jfs over 15 years
      A O(m) (single pass) and O(1) in space solution has no counter-examples -- uniqueSet stackoverflow.com/questions/177118/…
    • oz10
      oz10 over 15 years
      Um, you say that {1,3,1} should return false, but m here is 3, n = 1, all the numbers in the array are in the range 1..3, so I argue that this should return true according to the description of the problem.
    • b3.
      b3. over 15 years
      @austirg: The problem states that true is returned only if the vector contains all of the numbers in the range. In the {1, 3, 1} example the 2 is missing.
    • Derek Park
      Derek Park over 15 years
      J.F. Sebastian, as you pointed out below, the "uniqueSet" solution is not O(1) space. It's O(m), because it requires m extra bits of storage.
    • jfs
      jfs over 15 years
      @Derek: It requires additional storage only in Ruby version, where It always works, but in C version it doesn't require any additional storage, but It could fail on some sequences, but I've not seen a counter example yet.
    • jfs
      jfs over 15 years
      I've added counter-example for "uniqueSet" C version stackoverflow.com/questions/177118/…
    • Marcin
      Marcin over 15 years
      This is easy if you assume that factorial is available with the desired performance characteristic. If you want linear time, then you're golden.
    • jfs
      jfs over 15 years
      @Marcin: factorial counter-example: [1, 2, 4, 4, 4, 5, 7, 9, 9]. Product (9! = 362880) and sum (45) are the same with [1, 2, 3, 4, 5, 6, 7, 8, 9].
    • jfs
      jfs over 15 years
      I've found O(m) time, O(1) space solution (for int m) stackoverflow.com/questions/177118/#311497
    • AnT stands with Russia
      AnT stands with Russia over 14 years
      It is not clear from your statement of the problem whether n is the input parameter of of the problem.
    • Kyle Cronin
      Kyle Cronin over 14 years
      @AndreyT: no, it's not, it's just used as a way to describe the sequence
  • Smashery
    Smashery over 15 years
    But that's O(n) storage - the question asks for O(1) storage.
  • Smashery
    Smashery over 15 years
    "I'll leave that as an exercise to the reader unless you really want to know." - I really want to know - seems to me like that's the challenging part about this problem
  • Kyle Cronin
    Kyle Cronin over 15 years
    As would I. I thought I had a solution at one point, but it didn't hold up.
  • Hugh Allen
    Hugh Allen over 15 years
    You can solve the similar-sounding problem of finding the one non-duplicate in an array of duplicates using XOR... perhaps that's what hazzen was thinking of?
  • hazzen
    hazzen over 15 years
    XOR doesn't work; consider 1..6 which is 21, and the XOR of all digits is 7. But 6 + 6 + 5 + 1 + 1 + 2 also sums to 21 and also has a XOR of 7.
  • vrbl
    vrbl over 15 years
    Maybe I'm misunderstanding your algorithm, but (5,3,3,3,1) seems to swap 5 and 1 in steps 1-6, then get 6 for step 7, calculating that there are no duplicates.
  • Kent Fredric
    Kent Fredric over 15 years
    [2,2,4,6,6] => ? ( assuming you know m and not n )
  • hazzen
    hazzen over 15 years
    You have to keep track of where you started last time, and start one after that. So you would start at the second number. I'll make the description more clear.
  • hazzen
    hazzen over 15 years
    It is sorting using the ideas behind radix sort, which under these conditions is O(N) time and space, and we already have the O(N) space if we do it in place.
  • Kyle Cronin
    Kyle Cronin over 15 years
    You could do an initial pass to determine the n and subtract every element by that. The question is whether or not a sequence can be constructed to pass both the sum and product tests, which I don't know. But this is the most promising lead we have.
  • hazzen
    hazzen over 15 years
    The expected product does not scale very well unless you allow infinite precision numbers; if you do, you are essentially using more than O(N) space to represent those numbers. That does give me the idea of trying something like summing n^2 to m^2 (which has a known formula). Not sure of props on it
  • vrbl
    vrbl over 15 years
    Yeah, after tracing it on a few examples, I think you're right. Under the constraints imposed by the problem, you're always going to know the correct location for each element, so you're going to have to check each element exactly once if there's no dupes, and you'll stop early if there are.
  • andy
    andy over 15 years
    There's definitely a contradiction somewhere. The problem says "use O(1) space", and you are trickily using O(n) space. I mean, if we are allowed to use O(n) space, there are much simpler ways to check for duplicates.
  • hazzen
    hazzen over 15 years
    I agree, my solution only works if you can modify the array, then I use a trick with radix sort to get O(N) sorting. If you can't modify the array, there may be some math tricks we are missing.
  • David Hergert
    David Hergert over 15 years
    You're right, hazzen, this doesn't scale very well. Maybe instead of factorial, if we use the sum of squares of the first n natural numbers n(n + 1)(2n + 1)/6, that would work better.
  • Kyle Cronin
    Kyle Cronin over 15 years
    Or it could be that the problem is unsolvable. I make no guarantees about that. :) Your solution doesn't use any extra storage so it could be argued to be O(1) space, but it's not O(1) in the truest sense. However, if you gave this answer in an interview, the interviewer would be a fool not to hire.
  • nickf
    nickf over 15 years
    Big-O notation doesn't describe the space required to store information. Summing an array is an O(n) algorithm because you have to iterate over all the elements once each, therefore the amount of time required to do that calculation increases linearly with the number of elements..
  • Mark Ransom
    Mark Ransom over 15 years
    Big-O notation can be used for both time and space complexity. True, when not specified it means time, but both are still valid.
  • Kevin Day
    Kevin Day over 15 years
    This is quite elegant - but is it mathematically provable? The implication is that for a given m, there is only one series of length m that will result in a given combination of sum and sum-of-squares of that series. I googled, but could find no proof or corollary reference...
  • Kevin Day
    Kevin Day over 15 years
    This is effectively that is going on here [stackoverflow.com/questions/177118/…. It's that 'likely' in your comment that is bugging me...
  • Greg Hewgill
    Greg Hewgill over 15 years
    See my counterexamples above.
  • Greg Hewgill
    Greg Hewgill over 15 years
    See my counterexamples above.
  • David Hergert
    David Hergert over 15 years
    Oh well, so much for using the sum of squares. :p I see a number of counterexamples have been provided above. Does anyone have any other mathematical tricks we haven't tried yet?
  • nickf
    nickf over 15 years
    On your first point about the sum of the array. Change your equation to "m * (m + 2*n - 1) / 2" and that works for negative numbers as well.
  • Skizz
    Skizz over 15 years
    See below for explanation (300 chars not enough!)
  • EvilTeach
    EvilTeach over 15 years
    When one of the numbers is zero, your expected result would be zero.
  • jfs
    jfs over 15 years
    I've made more visible that the answer is irrelevant for the current vesion of the question.
  • jfs
    jfs over 15 years
    Summation is needless if we can check for duplicates. In this case n==min(array), (n+m-1)==max(array) will suffice. In other words inplace-bucket-sort + min + max == solution.
  • jfs
    jfs over 15 years
    As I understand it, you solution calculates some kind of 32-bit hash. But as we know hash functions are prone to collisions e.g., md5sum. Can you prove that there are no collisions possible under the constraints of the question?
  • jfs
    jfs over 15 years
    Your solution reminds me: "THEN A MIRACLE OCCURS" "I think you should be more explicit here in step two" (step 1 in your example) cartoon. :) sciencecartoonsplus.com/gallery/math/math07.gif
  • jfs
    jfs over 15 years
    Summation could either overflow or use additional memory. An array could be read-only.
  • jfs
    jfs over 15 years
    Phone company had used just a simple bucket sort.
  • jfs
    jfs over 15 years
    Actually it is O(n+m) in time and O(m) in space.
  • nickf
    nickf over 15 years
    There's no such thing as O(n+m). The "n" referred to here is not the same "n" in the solution. O(n) means that the amount of time/resources required to solve it is linear to the size of the set. en.wikipedia.org/wiki/Big_o_notation#Orders_of_common_functi‌​ons
  • nickf
    nickf over 15 years
    ...though I will admit that doing the sum at the start isn't actually necessary. I just added it because it'd be a very quick way to tell if the array fails.
  • jfs
    jfs over 15 years
    I've used general estimation of complexity. Consider case: n' records have keys in range [0, m). In your case n == m, therefore notation O(n+n) does not make sense indeed, but in general case n` could differ from m, then notation O(n+m) does make sense.
  • jfs
    jfs over 15 years
    Counter example: {0, 2, 7, 5,}. See my answer.
  • Kevin Day
    Kevin Day over 15 years
    Yup - and my algorithm above does a radix sort (which Hewgill says is O(n)) - so I'd say the above is close to optimal (unless someone can come up with a proof for one of the statistical approaches)...
  • nickf
    nickf over 15 years
    When using Big O notation, you merely describe how the resource usage (time/memory) goes up as your set size increases. Because this solution increases at a linear scale (that is: the length of time required to complete this function is t = k * n, where t is time, k is a constant and n is ... cont..
  • nickf
    nickf over 15 years
    ...and n is the size of the set. If you made the set size 5, an O(n) function might take 2 seconds to complete. If you doubled the set size to 10, it'll take 4 seconds. You don't go into the specifics with Big-O, just describe the relationship. This is O(n).
  • leo7r
    leo7r over 15 years
    I think that there is a problem with sorting the array in one pass, without knowing the n in advance. I have posted a workaround that in my answer.
  • Stephen Denne
    Stephen Denne over 15 years
    This algorithm fails for [1,3,3,4,4,5,8,8]
  • jfs
    jfs over 15 years
    What about [4, 2, 1, 3]? It meets the OP requirements, but It is not a cyclical shift of a sorted sequence.
  • jfs
    jfs over 15 years
    In addition your solution doesn't work for read-only sequences (it is not O(1) in space for such cases).
  • leo7r
    leo7r over 15 years
    [4,2,1,3] will be 'sorted' to [4,1,2,3] by the algorithm. I don't need the original array to be sorted this way. For read-only sequences it is probably impossible to check for dupes in O(m) time and constant memory.
  • jfs
    jfs over 15 years
    Counter example: {1, 1, 1, 2, 2}
  • jfs
    jfs over 15 years
    Your solution requires additional m bits, therefore it is not true O(1) in space. [m is the size (number of elements) of an array]
  • b3.
    b3. over 15 years
    Thanks for the counter example. I had implemented the algorithm incorrectly. It's now fixed.
  • jfs
    jfs over 15 years
    I've always thought that O(n) is defined via limit[n->+inf]( O(n)/n ) <= C, where C is a constant.
  • jfs
    jfs over 15 years
    ceil((n+m+1/2)) is it: (n+m+1)//2 + ((n+m+1) % 2), where // integer division and % is a modulo division?
  • jfs
    jfs over 15 years
    This version doesn't pass {1,2,3, 4, 5, 6, 7}. I will post version in C to avoid misinterpretation of pseudo-code.
  • jfs
    jfs over 15 years
  • b3.
    b3. over 15 years
    Ahhh, this is what happens when I don't have my morning coffee. It should have read "n+m-1" not "n+m+1". Your interpretation of the ceil function is correct.
  • jfs
    jfs over 15 years
  • jfs
    jfs over 15 years
    I've posted adapted (for the OP question) version stackoverflow.com/questions/177118/…
  • b3.
    b3. over 15 years
    You need to change (mina + m) to (mina + m - 1).
  • jfs
    jfs over 15 years
    What are the advantages of this version compared to in-place bucket sort e.g., stackoverflow.com/questions/177118/…
  • jfs
    jfs over 15 years
    I've changed (mina+m) -> (mina+m-1). Now It breaks on {1, 1, 2, 2, }
  • jfs
    jfs over 15 years
    Counter example: {1, 1, 2, 2, }
  • b3.
    b3. over 15 years
    This line is incorrect and not stated in my algorithm. We can't make the assumption that [n, n+m) -> [1,m]: const int v = a[i] - n + 1; // [n, n+m) -> [1, m]
  • b3.
    b3. over 15 years
    {1,1,2,2} is not a counter example. The algorithm produces the expected response for this input.
  • jfs
    jfs over 15 years
    @b3: it is not an assumption, e.g. {10, 12, 11} (where n=10, m=3) -> {1, 3, 2} (where n=1, m=3). Any algorithm that meets the task requirements must return the same answers for both ranges.
  • b3.
    b3. over 15 years
    The task requirements don't state that n is given. The only inputs are the length of the data vector and the data vector itself. The algorithm I presented doesn't allow for n as an input (which, I believe, is a more flexible solution than having to know n beforehand).
  • jfs
    jfs over 15 years
    btw, the a[i]-n+1 is noop for {1,1,2,2} i.e., even without v=a[i]-n+1 line the algorithm fails.
  • b3.
    b3. over 15 years
    You're right. I had removed an additional check of the upper and lowere bounds after the loop that I thought was unnecessary but in fact it's required. See the updated pseudocode: stackoverflow.com/questions/177118/…
  • jfs
    jfs over 15 years
    Updated algorithm produce valid result for {1,1,2,2} but fails on {0, 1, 2}. New counter example {0,1,2}. BTW, it might be impossible (from the mathematical point of view) to detect duplicates in O(N) time and O(1) in space.
  • jfs
    jfs over 15 years
    I've updated the C version to coincide with the pseudo-code. stackoverflow.com/questions/177118/…
  • b3.
    b3. over 15 years
    As you've noticed, this algorithm is only valid for inputs >= 1. I'm thinking about how to deal with < 1 without knowing n beforehand. The trick to staying in O(1) space is to deal with differences between numbers instead of sums of numbers...
  • jfs
    jfs over 15 years
    I've noticed that my 1st and 2nd comments seems like contradicting each other. To clarify: 1st comment refers to implementation based on finite numbers (like int in C). 2nd comments refers to integer with infinite precision (like integers in Ruby). BTW, as it implemented above it takes O(m+n) bits.
  • jfs
    jfs over 15 years
    Updated C version to deal with n=0, new counter example {1, 1, 2, 4, 6, 7, 7}. Now the C version is out of sync with pseudo-code
  • jfs
    jfs over 15 years
    I've updated C version to deal with {0, ...}. Now it fails on {1, 1, 2, 4, 6, 7, 7}, btw popopome's XOR version breaks on it too.
  • oz10
    oz10 over 15 years
    In-place radix sort only works if NO KEY IS ALREADY IN THE CORRECT PLACE. It would seem from the example, {2,3,4} that you CANNOT use an in-place radix sort here.
  • Charles Ma
    Charles Ma over 15 years
    Step one either requires > O(1) space or takes O(n) time to compute, if keeping track of a sum is technically using > O(1) space, then so is keeping track of a highest and lowest item...
  • Tim_K
    Tim_K over 15 years
    A sum doesn't take more than O(1) space. As problem size grows (in this case the size of the input array) the sum still occupies the same space and is constant.
  • jfs
    jfs over 15 years
    What are the advantages compared to in-place bucket sort? See stackoverflow.com/questions/177118/…
  • jfs
    jfs over 15 years
    Example: "division of two n-bit numbers takes time O(n(m + 1)), where m is the length of the quotient." en.wikipedia.org/wiki/Euclidean_algorithm#Running_time It shows that we can use more then one parameter while using Big O notation.
  • jfs
    jfs over 15 years
    A sum does take space. Addition of two n-digit numbers results in (n+1)-digit number. Otherwise we could perform computations with infinite precision using fixed-width number representation. en.wikipedia.org/wiki/…
  • jfs
    jfs over 15 years
    Sum could overflow. After you've done sorting it is sufficient to check max-min == m-1
  • nickf
    nickf over 15 years
    okay, well in your initial comment, that this is O(n + m), what's n and m? The only variable which has any impact on the running time of this equation is the size of the range. It doesn't matter whether it's 0-100 or 100-200, it'll still run in the same time.
  • jfs
    jfs over 15 years
    If we add two n-digit numbers we can get (n+1)-digit sum. This additional digit requires additional space. The more numbers we add the more space could be required.
  • Derek Park
    Derek Park over 15 years
    J.F. Sebastian, two n-digit numbers can still result in an n-digit number. 2+3=5. Nonetheless, dealing with overflow is really a different issue. No one asking a question like this in an interview is going to be worried about overflow, at least not until the basic algorithm is conquered.
  • Derek Park
    Derek Park over 15 years
    You can't call this a working solution when it breaks as soon a you have sequences longer than 32. Artificially limiting the solution doesn't make it O(1).
  • leo7r
    leo7r over 15 years
    The bucketsort assumes you know the range of integers in advance. This version works in one pass without this assumption. I re-read the question, it doesn't actually specify whether the range is a part of the input data or not.
  • jfs
    jfs over 15 years
    I've added counter-example for C version.
  • jfs
    jfs over 15 years
    @Derek: 2+3 can overflow. It depends on representation. We could represent numbers less then 4 in two bits, but we can't represent all numbers < 6 in two bits, it requires 3-bits. Therefore 2+3=1 if we use 2-bits width unsigned numbers. Substitute 2-bits by 32-bits and you'll get common case.
  • jfs
    jfs over 15 years
    @nickf: in your example n==m (size of the array is equal to size of all possible keys space), therefore O(n + m) -> O(n+n) -> O(2*n) -> O(n).
  • jfs
    jfs over 15 years
    @austrig: The in-place sort works. See stackoverflow.com/questions/177118/…
  • jfs
    jfs over 15 years
    I've constructed a counter-example. See above link to C version.
  • jfs
    jfs over 15 years
    About complexity: Operations you described will be O(1) for fixed-width representations e.g., 1 << 100 in C, but it is not O(1) for calculation with infinite precision e.g. 1 << 100 in Ruby.
  • Derek Park
    Derek Park over 15 years
    Yes, it depends on representation. This isn't what the question is looking for, though. Overflow is an issue that affects many algorithms. It is not, however, something that is generally taken into account when measuring runtime complexity.
  • jfs
    jfs over 15 years
    Counter-example for both the sum and product tests: {1, 2, 4, 4, 4, 5, 7, 9, 9} (sum=45, product=362880)
  • CaptSolo
    CaptSolo over 15 years
    In most programming languages a number would take fixed amount of bytes regardless of the number of digits. Even if it were to increase (e.g., as a result of switching from int to longit), it would increase so rarely that it would not matter.
  • Skizz
    Skizz over 15 years
    No, for that set, the expected (i.e. the set [1,2,3,4,5,6,7,8]) and actual mean are both 4.5 but the variance is 4.67 and 5.25 respectively.
  • Skizz
    Skizz over 15 years
    Oops, my bad there. I note the forumla is wrong in the main post, the variance for a uniform distribution is (n-1)(n+1)/12 and the first comment does indeed bugger it up.
  • jfs
    jfs over 15 years
    Just a side-note: under your assumptions m is not int, but unsigned int.
  • jfs
    jfs over 15 years
    I've implemented nickf's method in C. It works fine. I don't know whether his implementation is correct, but at least the method is.
  • Renze de Waal
    Renze de Waal about 15 years
    The title states that if you have m consecutive numbers, then the product is divisible by m!. This does not allow you to conclude that you have consecutive numbers if the product is divisible by m!. [9,5,8,7,6] results in 1, in your example. So does [18,5,8,7,6], but the numbers are not consecutive.
  • Liran Orevi
    Liran Orevi almost 15 years
    Regarding sum, I've noticed that adding two numbers with x digits, creates a number with no more than x+1 digits, thus sum increases with log(numbers_to_be_summed), for example 8 numbers, 4 bits each, form 4 groups of couples of 4 bits numbers, each group is summed to 5 bit number, the 4 numbers of of 5 bits, create two groups, which summed to two 6 bits numbers, which are summed to one 7 bit number that is (2^4 * 2^3) thus for practical uses, it can be considered as O(1) (adding 2^64 numbers each of 64 bits will require 128 bits)
  • ignoramous
    ignoramous almost 14 years
    The above code fails for a[] = {6, 5, 7, 9, 10}; With an array out of bounds after it encounters a swap of '9' and '10'. Possibly a problem with the original algorithm too?
  • Aaron McDaid
    Aaron McDaid over 12 years
    This answer is a total mess. There are a number of different ideas that should be split into separate answers. I don't care if they're good answers, I'm downvoting this until it's tidied up!
  • BetweenTwoTests
    BetweenTwoTests about 12 years
    See the counterexample under @Skizz's post
  • Admin
    Admin almost 10 years
    How do you figure out a without passing over the array once?