First occurrence in a binary search
Solution 1
Having found a matching value, you basically need to walk up the collection until you find an entry which doesn't match.
You could potentially make it faster by fetching the index of a key immediately lower than the one you were looking for, then do a binary chop between the two - but I'd probably go for the simpler version, which is likely to be "efficient enough" unless you've got a really large number of equal entries.
Solution 2
An addition to Jon Skeets post:
The potential faster implementation is actually not hard to implement and adds only 2 lines of code, here is how I'd do it:
if (midVal < key)
low = mid + 1;
else if (midVal > key)
high = mid - 1;
else if (low != mid) //Equal but range is not fully scanned
high = mid; //Set upper bound to current number and rescan
else //Equal and full range is scanned
return mid;
Solution 3
If your data is all integral, then this hack can help. It uses a float array to store values.
float array[]; //contains all integral values
int searchValue;
int firstIndex = -(binarySearch(array, (float)searchValue - 0.5F) + 1);
Basically what it does is find the insertion index of a value in between your search value and the integer before it. Since all values are integral, it finds the first occurrence of the search value.
Also this runs is log(n) time.
Example:
import java.util.Arrays;
public class BinarySearch {
// considering array elements are integers
float ar[] = new float[] { 1, 2, 3, 3, 4, 4, 5, 9, 9, 12, 12 };
public void returnFirstOccurrence(int key) {
int firstIndex = -(Arrays.binarySearch(ar, key - 0.5F) + 1);
if (ar[firstIndex] != key)
System.out.println("Key doesn't exist");
else
System.out.println("First index of key is " + firstIndex);
}
public static void main(String Args[]) throws Exception {
new BinarySearch().returnFirstOccurrence(9);
}
}
OUTPUT: 7
p.s: I've used this trick on several coding contests and it nicely worked everytime.
Solution 4
You could implement "lower bound" algorithm instead of binary search. This algorithm is used e.g. in C++/STL and its transcript into Java is straightforward. The algorithmic complexity of lower bound is also O(log n) as the binary search. This is better than to use binary search first and than linearly search for the first matching element - this would have worst case behaviour O(n).
Solution 5
You can an adapt your existing search algorithm just by having a sharper definition of matching. You can tell that the highlighted 5 in the sequence 1,3,5,5,5,9 is the first one because the number before it (3) is smaller than 5. So if mid lands on array element equal to the the key you only treat it as a match if a[mid-1] is less than key, other equal array elements are treated like greater than elements. Now you algorithm becomes (after including Jon Skeet's suggestion to return negatives for insertion points):
public static int binarySearch(int[] a, int key) {
int low=0,high=a.length-1;
while (low<=high) {
int mid=(low+high) >>> 1;
int midVal=a[mid];
if (midVal < key)
low=mid+1;
else if (mid>0 && a[mid-1]>=key) //we already know midval>=key here
high=mid-1;
else if (midVal==key) //found the 1st key
return mid;
else
return ~mid; //found insertion point
}
return ~(a.length); //insertion point after everything
}
It uses more comparisons but went faster than Stev314's version in my benchmarks probably because of cache effects.
Related videos on Youtube
Comments
-
Amir Afghani almost 2 years
I'm tinkering with some code and I realized something I never knew. A normal binary search will return a random index in a data set for a key that occurs more than once. How can I modify this code below to return the first occurrence? Is this something people do?
//ripped from the JDK public static int binarySearchValue(InvertedContainer.InvertedIndex[] a, long key) { return bSearchVal(a, 0, a.length, key); } private static int bSearchVal(InvertedContainer.InvertedIndex[] a, int fromIndex, int toIndex, long key) { int low = fromIndex; int high = toIndex - 1; while (low <= high) { int mid = (low + high) >>> 1; long midVal = a[mid].val; if (midVal < key) low = mid + 1; else if (midVal > key) high = mid - 1; else return mid; // key found } return (low); // key not found. return insertion point }
-
Admin almost 13 yearsCool - I get to rep-whore my own question and answer... stackoverflow.com/questions/4948162/… explains a form of the binary search that can find the first item > or >=, or the last item < or <=.
-
Amir Afghani almost 13 yearsHah, thanks! I'll take a look. Every once in a while I notice something like this and I think 'you know nothing'.
-
-
Amir Afghani almost 13 yearsHi Jon - so I need a while loop in the else clause? I understand the idea, but it seems like it would look kludgish, no?
-
Amir Afghani almost 13 yearsBTW, I'm a big fan. Thanks for your response so far.
-
Jon Skeet almost 13 years@Amir: That's certainly one way of doing it. Another is to leave the method as it is (preferably with a name change :) but offer another method which will find the first one, by calling the original method and then performing the while loop (if there was no match). Note that in APIs I've used, the insertion point is returned as a negative value (using
~
) to indicate a failure to find the value, but also indicating the insertion point at the same time. -
Admin almost 13 yearsAlthough this is called a lower bound in the C++ library, it's still a binary search - at least according to my old copy of Algorithms and Data Structures (Modula 2 Edition) by Niklaus Wirth. Maybe it's a matter of opinion, though, where the boundary is between "different algorithm" and "variant of the same algorithm".
-
Jiri Kriz almost 13 years"Binary search" as implemented by many (most?) libraries (e.g. C, C++/STL, Java) does not specify which index is returned when multiple keys are present. This was also the problem in the posed question. "Lower bound" specifies exactly the result.
-
Admin almost 13 yearsAgreed, but good names for library functions aren't necessarily the same as in the algorithms textbook, especially when the library may have several variants. BTW, I didn't mean to claim you're wrong about anything, and I upvoted your answer.
-
Amir Afghani almost 13 yearsHow would you change this to work for a last index of instead of a first index of?
-
bezmax almost 13 yearsSimply invert the high and low (didnt test if that works):
.... else if (high != mid) low = mid; else return mid;
-
Vaibhav Bajpai over 11 yearsfirst item:
mid = (unsigned int) floor((low + high) / 2.0);
last item:mid = (unsigned int) ceil((low + high) / 2.0);
-
nawfal almost 10 yearswhile it's trivial to extend, just want to point out that OP wants the first occurrence "equal to", not "greater than or equal to".
-
nawfal almost 10 yearsCould you explain? How does this get the first index of occurrence?
-
Arnab almost 10 yearsBinary search implementation in java Collections either returns the index of the number OR if the number is not present it returns the index of position where the number can be inserted. see link. Also I've edited to include an example.
-
nawfal almost 10 yearsgot it. It's not just hacky but extremely hacky :) Not only that it works only for integers, but still you would want a
float[]
to hold them all. In case the client originally has anint[]
, then creating a newfloat[]
might have some cost. You might want to make the two conditions written in bold :) -
Arnab almost 10 yearsAs I said, I only use it in contests. I'm a student and don't have any industrial experience but I agree it would be very inappropriate and utterly confusing to use it in a production code.
-
Juan Martinez over 9 yearsThis solution has O(N) time complexity as there can be up to N elements with the value you are searching for.
-
Jon Skeet over 9 years@JuanMartinez: Hence the "efficient enough unless you've got a really large number of equal entries" bit at the end of the answer.
-
Ashwani Kumar Rahul over 8 years@bezmax i need a little help of yours... i want to know whether this code will work faster than the code in this link hackerearth.com/notes/searching-code-monk please see the code for first occurrence
-
jedierikb almost 8 years@VaibhavBajpai where / how does that fit in?
-
jedierikb almost 8 years
-
mcvkr almost 3 yearsno to me, using this kind of solution in production is better than using some ad hoc algo on an int[] ... there is an O(n) conversion cost if you do this once but if you create the array as float[] from the beginning and maintain it during lifetime of your program and run the search many times, there will be no significant performance diff.