Faster than binary search for ordered list

c++ arrays algorithm search binary-search

32,715

Solution 1

You can do better than O(log n) if the values are integers, in which case the best worst-case running time you can achieve, in terms of n, is O(sqrt(log n)). Otherwise, there is no way to beat O(log n) unless there are patterns in the input sequence. There are two approaches used to beat O(log n) in the case of integers.

First, you can use y-fast trees which work by storing in a hash table all prefixes for which you are storing at least one integer with that prefix. This enables you to perform a binary search to find the length of the longest matching prefix. This enables you to find the successor of an element for which you are searching in time O(log w) where w is the number of bits in a word. There are some details to work though to make this work and use only linear space, but they aren't too bad (see the link below).

Second, you can use fusion trees, which use bit tricks to enable you to perform w^O(1) comparisons in just a constant number of instructions, yielding a running time of O(log n / log w).

The optimum tradeoff between these two data structures occurs when log w = sqrt(log n), giving a running time of O(sqrt(log n)).

For details on the above, see lectures 12 and 13 of Erik Demaine's course: http://courses.csail.mit.edu/6.851/spring07/lec.html

Solution 2

One possibility is to treat it like finding the roots of a function. Basically, finding:

a[i] <= i <= a[i + 1]

Is equivalent to:

a[i] - i <= 0 <= a[i + 1] - i

Then you could try something like Newton's method and so on. These kinds of algorithms frequently converge faster than a binary search when they work, but I don't know of one that is guaranteed to converge for all input.

http://en.wikipedia.org/wiki/Root-finding_algorithm

Solution 3

What about the following algo? it is called Exponential Search and is one of the variations of binary search. http://en.m.wikipedia.org/wiki/Exponential_search

Searching for element k in sorted array A of size n. Lookup A[2^i] for i=0, 1, 2,... until you go beyond k's position in A. then do a binary search on the part of the array left (smaller) than i.

int exponential_search(int A[], int key)
{
  // lower and upper bound for binary search
  int lower_bound = 0;
  int upper_bound = 1;

  // calculate lower and upper bound
  while (A[upper_bound] < key) {
    lower_bound = upper_bound;
   upper_bound = upper_bound * 2;
  }
  return binary_search(A, key, lower_bound, upper_bound);
}

This algo will run on O(log idx) where idx is the index of k in A. (both stpes are in log idx). In the worst case, the algo is in O(log idx), if k is amongst the largest elements of A or bigger than any element of A. The multiplicative constant is larger than for binary search but the algo would run faster for very large arrays and when looking for data that's towards the beginning of the array.

I'D like to have some idea of the minimal size n where this algo becomes preferable to binary search, but I don't know.

Solution 4

Yes and no. Yes there are searches that are faster, on average, than a bisection search. But I believe that they are still O(lg N), just with a lower constant.

You want to minimize the time taken to find your element. Generally it is desirable to use fewer steps, and one way to approach this is to maximize the expected number of elements that will be eliminated at each step. With bisection, always exactly half the elements are eliminated. You can do better than this, IF you know something about the distribution of the elements. But, the algorithm for choosing the partition element is generally more complicated than choosing the midpoint, and this extra complexity may overwhelm any time savings you expected to get from using fewer steps.

Really, in a problem like this it's better to attack second-order effects like cache locality, than the search algorithm. For example, when doing a repeated binary search, the same few elements (first, second, and third quartiles) are used VERY frequently, so putting them in a single cache line could be far superior to random access into the list.

Dividing each level into say 4 or 8 equal sections (instead of 2) and doing a linear search through those could also be quicker than the bisection search, because a linear search doesn't require calculating the partition and also has fewer data dependencies that can cause cache stalls.

But all of these are still O(lg N).

Solution 5

If the values in the list are evenly distributed then you could try a weighted split instead of a binary split, e.g. if the desired value is a third of the way from the current lower limit to the current value then you could try the element that is also a third of the way. This could suffer badly on lists where values are bunched up though.

View more solutions

32,715

Author by

uray

i'am a programmer

Updated on October 14, 2021

Comments

uray over 2 years

is there an algorithm that is faster than binary search, for searching in sorted values of array?

in my case, I have a sorted values (could be any type values) in an A array, I need to return n if the value I was looking is in range of A[n] and A[n+1]
xscott over 13 years

It's possible that his array does not contain the value n, but does contain two values that bracket n. It's not obvious that hashing is applicable here.
tchrist over 13 years

On a single ordered list, no. But there are much faster searches; you just need a different data structure than an ordered list. A hash would be virtually constant in lookup time, at a cost of a great deal more memory. A hybrid approach could take the approach of a dictionary.
Ben Voigt over 13 years

Some more optimization is necessary. You don't want to choose the element closest to where you guess the answer to be, you want to test a point between the guessed location and the center of list, so that with p > .5 you eliminate more than half the list. The exact optimal partition point depends on the distribution of values in the list.
Ben Voigt over 13 years

@tchrist: The problem requires finding the pair of elements that tightly bound a sought entry that is not in the list at all. Hashing only finds exact matches.
srean over 13 years

Oh I missed that. But you could still hash first and fall back to binary search if the value is not in the key set. But this is an added complexity. In general you cannot do better than the entropy of the distribution of the values. If you knew the distribution you can use a Huffman tree to decide where you partition.
srean over 13 years

Newton's method requires a differentiable function, so one would have to fit an interpolating spline first. If the values are uni-modal its quite well behaved else it might diverge and act totally bizarre.
xscott over 13 years

Yes. You can use a linear spline, and the derivative at any point is: f'(i) = a[i+1] - a[i]
srean over 13 years

Linear splines are piecewise linear, so its derivative wont be continuous. One has to go for quadratic atleast. Which is no biggie. This will turn out to be similar to [en.wikipedia.org/wiki/Interpolation_search]
srean over 13 years

What you describe is exactly interpolation search. @Ben an efficient way to implement your suggestion is through a Huffman tree
tchrist over 13 years

Whoops, you're right. Somehow I only read the first sentence, not the second one.
xscott over 13 years

I don't believe the derivative needs to be continuous in Newton's method. Thanks for the link on interpolation search.
xscott over 13 years

I'd like to know more about fusion trees. Maybe you'd be willing to explaing them: stackoverflow.com/questions/3878320/understanding-fusion-tre‌es
srean over 13 years

Just to calrify by "this" I meant your suggestion of using linear interpolation.
Barney Szabolcs over 11 years

@xcott Im not sure you're not over-optimizing unless you are writing a professional numerics library.
harry almost 7 years

Note that the multiplication here can be replaced with a simple binary shift; it's really cheap.
Chris McCowan over 3 years

The compiler is likely to make that optimization for you.