O(log n) algorithm to find best insert position in sorted array

arrays algorithm sorting linked-list

14,717

Solution 1

There's several problems with your code:

mid = abs(end - start) / 2

This is not the middle between start and end, it's half the distance between them (rounded down to an integer). Later you use it like it was indeed a valid index:

findBestIndex(target, start, mid - 1)

Which it is not. You probably meant to use mid = (start + end) // 2 or something here. You also miss a few indices because you skip over the mid:

return findBestIndex(target, start, mid - 1)
 ...
return findBestIndex(target, mid + 1, end)

Your base case must now be expressed a bit differently as well. A good candidate is the condition

if start == end

Because now you definitely know you're finished searching. Note that you also should consider the case where all the array elements are smaller than target, so you need to insert it at the end.

I don't often search binary, but if I do, this is how

Binary search is something that is surprisingly hard to get right if you've never done it before. I usually use the following pattern if I do a binary search:

lo, hi = 0, n // [lo, hi] is the search range, but hi will never be inspected.
while lo < hi:
    mid = (lo + hi) // 2
    if check(mid): hi = mid
    else:          lo = mid + 1

Under the condition that check is a monotone binary predicate (it is always false up to some point and true from that point on), after this loop, lo == hi will be the first number in the range [0..n] with check(lo) == true. check(n) is implicitely assumed to be true (that's part of the magic of this approach).

So what is a monotone predicate that is true for all indices including and after our target position and false for all positions before?

If we think about it, we want to find the first number in the array that is larger than our target, so we just plug that in and we're good to go:

lo, hi = 0, n
while lo < hi:
    mid = (lo + hi) // 2
    if (a[mid] > target): hi = mid
    else:                 lo = mid + 1
return lo;

Solution 2

this is the code I have used:

int binarySearch( float arr[] , float x , int low , int high )
{
    int mid;
    while( low < high ) {
        mid = ( high + low ) / 2;
        if( arr[mid]== x ) {
            break;
        }
        else if( arr[mid] > x ) {
            high=mid-1;
        }
        else {
            low= mid+1;
        }
    }
    mid = ( high + low ) / 2;
    if (x<=arr[mid])
        return mid;
    else 
        return mid+1;
}

the point is that even when low becomes equal to high you have to check.

see this example for instance: 0.5->0.75 and you are looking for true position of 0.7 or 1.

in both cases when going out of while loop: low=high=1 but one of them should be placed in position 1 and the other in position 2.

Solution 3

I solved this by counting the number of elements that are strictly smaller (<) than the key to insert. The retrieved count is the insert position. Here is a ready to use implementation in Java:

int binarySearchCount(int array[], int left, int right, int key) {
    if(left > right) {
        return -1; // or throw exception
    }
    int mid = -1;   //init with arbitrary value 

    while (left <= right) {
        // Middle element
        mid = (left + right) / 2;

        // If the search key on the left half
        if (key < array[mid]) {
            right = mid - 1;
        }
        // If the search key on the right half
        else if (key > array[mid]) {
            left = mid + 1;
        }
        // We found the key
        else {
            // handle duplicates
            while(mid > 0 && array[mid-1] == array[mid]) {
                --mid;
            }
            break;
        }
    }

    // return the number of elements that are strictly smaller (<) than the key
    return key <= array[mid] ? mid : mid + 1;
}

Solution 4

You are on the right track.

First, you do not need abs in mid = abs(end + start) / 2

Assume abs here means absolute value, because end should always be no less than start, unless there is some mistake in your code. So here abs never helps but may be potentially hiding your problem make it hard to debug.

You do not need if (mid < 2) section either , nothing special about mid smaller than two.

array = generateSomeArrayOfOrderedNumbers()

int start = 0;
int end = array.size(); 

int findBestIndex(target, start, end){

if (start == end){   //you already searched entire array, return the position to insert
  if (stat == 0) return 0; // if it's  the beginning of the array just return 0.
  if(array[start] > target) return start -1; //if last searched index is bigger than target return the position before it.
else return start;
}
mid = (end - start) / 2

// find correct position 
if(target == array[mid]) return mid;

if (target < array[mid])
{
 // The target belongs on the left side of our list //
return findBestIndex(target, start, mid - 1)
}
else
{
 // The target belongs on the right side of our list //
 return findBestIndex(target, mid + 1, end)
}
}

Solution 5

Below is the code that is used to search a target value (which is a list of an array) from the sorted array (It contains duplicate values).

It returns the array of positions where we can insert the target values.

Hope this code helps you in any way.

Any suggestions are welcome.

static int[] climbingLeaderboard(int[] scores, int[] alice) {
    int[] noDuplicateScores = IntStream.of(scores).distinct().toArray();
    int[] rank = new int[alice.length];

    for (int k = 0; k < alice.length; k++) {
        int i=0;
        int j = noDuplicateScores.length-1;
        int pos=0;
        int target = alice[k];
        while(i<=j) {
            int mid = (j+i)/2;
            if(target < noDuplicateScores[mid]) {
                i = mid +1;
                pos = i;
            }else if(target > noDuplicateScores[mid]) {
                j = mid-1;
                pos = j+1;
            }else {
                pos = mid;
                break;
            }
        }
        
        rank[k] = pos+1;
    }

    return rank;
 }

View more solutions

14,717

Author by

Brayden

Updated on June 05, 2022

Comments

Brayden almost 2 years

I'm trying to make an algorithm that finds the best position to insert the target into the already sorted array.

The goal is to either return the position of the item if it exists in the list, else return the position it would go into to keep the list sorted.

So say I have a list:

   0   1   2   3   4    5    6
 ---------------------------------
 | 1 | 2 | 4 | 9 | 10 | 39 | 100 |
 ---------------------------------

And my target item is 14 It should return an index position of 5

Pseudo-code I currently have:

array = generateSomeArrayOfOrderedNumbers()

number findBestIndex(target, start, end)
    mid = abs(end - start) / 2

    if (mid < 2) 
        // Not really sure what to put here
        return start + 1 // ??

    if (target < array[mid])
        // The target belongs on the left side of our list //
        return findBestIndex(target, start, mid - 1)
    else
        // The target belongs on the right side of our list //
        return findBestIndex(target, mid + 1, end)

I not really sure what to put at this point. I tried to take a binary search approach to this, but this is the best I could come up with after 5 rewrites or so.

pasztorpisti about 10 years

+1, some people are usually struggling because of using closed intervals, in most cases half open intervals are much easier to use and result in more elegant code like in this case.
Niklas B. about 10 years

@pasztorpisti: I have to disagree with you on that point, this code actually does not work with half-open intervals and that's what makes it elegant. The loop invariant is that the first element is in the closed range [lo..hi], with the special case where hi == n. Otherwise the loop condition would be while (lo + 1 < hi) or something.
pasztorpisti about 10 years

I didn't say that your code doesn't work with half open intervals.
Niklas B. about 10 years

@pasztorpisti: yeah I just wanted to make you aware that using closed intervals is what makes this algorithm elegant. It's a bit less elegant with exclusive right border.
Brayden about 10 years

This may be a dumb question, but say I have a list of 7 integers. They are added in the following order 1, 6, 3, 5, 4, 7, 2. In my implementation, everything goes great, until I reach 7. The order after inserting them all is: 1, 2, 3, 7, 4, 5, 6. Is there anything im missing in regards to this code. This is the code: gist.github.com/headdetect/b8a889075198578ccb46
Niklas B. about 10 years

@Brayden: Well looking at your function, you "forget" some indices, because you recurse into [start, mid - 1] and [mid + 1, end] and just throw away the mid position. I also think that your base case is wrong. This is one of the things where I just sticked to a certain pattern after some time and never made a mistake since, so I encourage you to understand the problems in your code and maybe compare it to a slightly smaller and a bit more elegant variant that at least to me is very intuitive.
Niklas B. about 10 years

@Brayden: I added a bit of analysis of your code, which is pretty broken to be honest.
Niklas B. about 10 years

@Brayden: I added a bit of an analysis of your code, which seems a bit broken
pasztorpisti about 10 years

Yea, so you are finally back to the algorithm that works with half open interval as you copy pasted initially...
Niklas B. about 10 years

@pasztorpisti: No, it works with closed intervals... The loop invariant is that the first i with check(i) == true is in the range [lo..hi].
pasztorpisti about 10 years

OK, try this: a=[0, 1, 2, 3] lo=0 hi=3 target=4 The result should be 4 I think.
Niklas B. about 10 years

let us continue this discussion in chat
Erti-Chris Eelmaa about 10 years

might or might not apply to this case:googleresearch.blogspot.co.uk/2006/06/…
Niklas B. about 10 years

@Erti well this particular implementation is not broken. I've used it dozens of times in programming competitions :)
Niklas B. about 10 years

@Erti The overflow part is a big technical, I wouldn't consider that a bug even in C or Java, because we can just use a larger integer type (I rarely encounter arrays with 2^63 elements). Since Python has bigints, overflow is never a concern w.r.t. correctness