First occurrence in a binary search

31,965

Solution 1

Having found a matching value, you basically need to walk up the collection until you find an entry which doesn't match.

You could potentially make it faster by fetching the index of a key immediately lower than the one you were looking for, then do a binary chop between the two - but I'd probably go for the simpler version, which is likely to be "efficient enough" unless you've got a really large number of equal entries.

Solution 2

An addition to Jon Skeets post:

The potential faster implementation is actually not hard to implement and adds only 2 lines of code, here is how I'd do it:

    if (midVal < key)
        low = mid + 1;
    else if (midVal > key)
        high = mid - 1;
    else if (low != mid) //Equal but range is not fully scanned
        high = mid; //Set upper bound to current number and rescan
    else //Equal and full range is scanned
        return mid;

Solution 3

If your data is all integral, then this hack can help. It uses a float array to store values.

float array[];    //contains all integral values
int searchValue;

int firstIndex = -(binarySearch(array, (float)searchValue - 0.5F) + 1);

Basically what it does is find the insertion index of a value in between your search value and the integer before it. Since all values are integral, it finds the first occurrence of the search value.

Also this runs is log(n) time.

Example:

import java.util.Arrays;

public class BinarySearch {
    // considering array elements are integers
    float ar[] = new float[] { 1, 2, 3, 3, 4, 4, 5, 9, 9, 12, 12 };

    public void returnFirstOccurrence(int key) {
        int firstIndex = -(Arrays.binarySearch(ar, key - 0.5F) + 1);
        if (ar[firstIndex] != key)
            System.out.println("Key doesn't exist");
        else
            System.out.println("First index of key is " + firstIndex);
    }

    public static void main(String Args[]) throws Exception {
        new BinarySearch().returnFirstOccurrence(9);
    }

}

OUTPUT: 7

p.s: I've used this trick on several coding contests and it nicely worked everytime.

Solution 4

You could implement "lower bound" algorithm instead of binary search. This algorithm is used e.g. in C++/STL and its transcript into Java is straightforward. The algorithmic complexity of lower bound is also O(log n) as the binary search. This is better than to use binary search first and than linearly search for the first matching element - this would have worst case behaviour O(n).

Solution 5

You can an adapt your existing search algorithm just by having a sharper definition of matching. You can tell that the highlighted 5 in the sequence 1,3,5,5,5,9 is the first one because the number before it (3) is smaller than 5. So if mid lands on array element equal to the the key you only treat it as a match if a[mid-1] is less than key, other equal array elements are treated like greater than elements. Now you algorithm becomes (after including Jon Skeet's suggestion to return negatives for insertion points):

public static int binarySearch(int[] a, int key) {
    int low=0,high=a.length-1;
    while (low<=high) {
        int mid=(low+high) >>> 1;
        int midVal=a[mid];
        if (midVal < key) 
            low=mid+1;
        else if (mid>0 && a[mid-1]>=key) //we already know midval>=key here
            high=mid-1;
        else if (midVal==key) //found the 1st key 
             return mid;
        else
            return ~mid;      //found insertion point
    }
    return ~(a.length);       //insertion point after everything
}

It uses more comparisons but went faster than Stev314's version in my benchmarks probably because of cache effects.

Share:
31,965

Related videos on Youtube

Amir Afghani
Author by

Amir Afghani

https://www.youtube.com/user/amir650

Updated on July 09, 2022

Comments

  • Amir Afghani
    Amir Afghani almost 2 years

    I'm tinkering with some code and I realized something I never knew. A normal binary search will return a random index in a data set for a key that occurs more than once. How can I modify this code below to return the first occurrence? Is this something people do?

    //ripped from the JDK
    public static int binarySearchValue(InvertedContainer.InvertedIndex[] a, long key) {
        return bSearchVal(a, 0, a.length, key);
    }
    
    private static int bSearchVal(InvertedContainer.InvertedIndex[] a, int fromIndex,
                                     int toIndex, long key) {
        int low = fromIndex;
        int high = toIndex - 1;
    
        while (low <= high) {
            int mid = (low + high) >>> 1;
            long midVal = a[mid].val;
    
            if (midVal < key)
                low = mid + 1;
            else if (midVal > key)
                high = mid - 1;
            else
                return mid; // key found
        }
        return (low); // key not found. return insertion point
    }
    
    • Admin
      Admin almost 13 years
      Cool - I get to rep-whore my own question and answer... stackoverflow.com/questions/4948162/… explains a form of the binary search that can find the first item > or >=, or the last item < or <=.
    • Amir Afghani
      Amir Afghani almost 13 years
      Hah, thanks! I'll take a look. Every once in a while I notice something like this and I think 'you know nothing'.
  • Amir Afghani
    Amir Afghani almost 13 years
    Hi Jon - so I need a while loop in the else clause? I understand the idea, but it seems like it would look kludgish, no?
  • Amir Afghani
    Amir Afghani almost 13 years
    BTW, I'm a big fan. Thanks for your response so far.
  • Jon Skeet
    Jon Skeet almost 13 years
    @Amir: That's certainly one way of doing it. Another is to leave the method as it is (preferably with a name change :) but offer another method which will find the first one, by calling the original method and then performing the while loop (if there was no match). Note that in APIs I've used, the insertion point is returned as a negative value (using ~) to indicate a failure to find the value, but also indicating the insertion point at the same time.
  • Admin
    Admin almost 13 years
    Although this is called a lower bound in the C++ library, it's still a binary search - at least according to my old copy of Algorithms and Data Structures (Modula 2 Edition) by Niklaus Wirth. Maybe it's a matter of opinion, though, where the boundary is between "different algorithm" and "variant of the same algorithm".
  • Jiri Kriz
    Jiri Kriz almost 13 years
    "Binary search" as implemented by many (most?) libraries (e.g. C, C++/STL, Java) does not specify which index is returned when multiple keys are present. This was also the problem in the posed question. "Lower bound" specifies exactly the result.
  • Admin
    Admin almost 13 years
    Agreed, but good names for library functions aren't necessarily the same as in the algorithms textbook, especially when the library may have several variants. BTW, I didn't mean to claim you're wrong about anything, and I upvoted your answer.
  • Amir Afghani
    Amir Afghani almost 13 years
    How would you change this to work for a last index of instead of a first index of?
  • bezmax
    bezmax almost 13 years
    Simply invert the high and low (didnt test if that works): .... else if (high != mid) low = mid; else return mid;
  • Vaibhav Bajpai
    Vaibhav Bajpai over 11 years
    first item: mid = (unsigned int) floor((low + high) / 2.0); last item: mid = (unsigned int) ceil((low + high) / 2.0);
  • nawfal
    nawfal almost 10 years
    while it's trivial to extend, just want to point out that OP wants the first occurrence "equal to", not "greater than or equal to".
  • nawfal
    nawfal almost 10 years
    Could you explain? How does this get the first index of occurrence?
  • Arnab
    Arnab almost 10 years
    Binary search implementation in java Collections either returns the index of the number OR if the number is not present it returns the index of position where the number can be inserted. see link. Also I've edited to include an example.
  • nawfal
    nawfal almost 10 years
    got it. It's not just hacky but extremely hacky :) Not only that it works only for integers, but still you would want a float[] to hold them all. In case the client originally has an int[], then creating a new float[] might have some cost. You might want to make the two conditions written in bold :)
  • Arnab
    Arnab almost 10 years
    As I said, I only use it in contests. I'm a student and don't have any industrial experience but I agree it would be very inappropriate and utterly confusing to use it in a production code.
  • Juan Martinez
    Juan Martinez over 9 years
    This solution has O(N) time complexity as there can be up to N elements with the value you are searching for.
  • Jon Skeet
    Jon Skeet over 9 years
    @JuanMartinez: Hence the "efficient enough unless you've got a really large number of equal entries" bit at the end of the answer.
  • Ashwani Kumar Rahul
    Ashwani Kumar Rahul over 8 years
    @bezmax i need a little help of yours... i want to know whether this code will work faster than the code in this link hackerearth.com/notes/searching-code-monk please see the code for first occurrence
  • jedierikb
    jedierikb almost 8 years
    @VaibhavBajpai where / how does that fit in?
  • jedierikb
    jedierikb almost 8 years
  • mcvkr
    mcvkr almost 3 years
    no to me, using this kind of solution in production is better than using some ad hoc algo on an int[] ... there is an O(n) conversion cost if you do this once but if you create the array as float[] from the beginning and maintain it during lifetime of your program and run the search many times, there will be no significant performance diff.