PHP, in_array and fast searches (by the end) in arrays

15,431

Solution 1

I assume that in_array is a linear search from 0 to n-1.

The fastest search will be to store the values as the keys and use array_key_exists.

$a['foo'] = true;
$a['bar'] = true;

if (array_key_exists('foo', $a)) ...

But if that's not an option, you can make your own for indexed arrays quite easily:

function in_array_i($needle, array $a, $i = 0);
{
  $c = count($a);
  for (;$i < $c; ++$i)
    if ($a[$i] == $needle) return true;
  return false;
}

It will start at $i, which you can keep track of yourself in order to skip the first elements.

Or alternatively...

function in_array_i($needle, array $a, $i = 0);
{
  return in_array($needle, $i ? array_slice($a, $i) : $a);
}

You can benchmark to see which is faster.

Solution 2

How works internally the in_array function?

Internally the in_array() searches from the beginning to the end of the array. So in your case this is slow.

Depending of the nature of your data you can change the search strategy. If you only have non-duplicate values and all values are either string or integer (not NULL), a common trick is to array_flip() the array which works pretty fast and then check if there is an entry for your value as key in the array hash via isset():

  $array = array( ... non-duplicate string and integer values ... );
  $needle = 'find me!';
  $lookup = array_flip($array);
  $found = isset($lookup[$needle]) ? $lookup[$needle] : false;
  if (false === $found) {
    echo "Not found!\n";
  } else {
    echo "Found at {$found}!\n";
  }

If these pre-conditions are not met, you can do that what konforce suggested.

If you have really much data and it's not only that you're looking at either from the beginning or end, you might want to implement one search algorithm on your own, like neither starting from the beginning nor end, but wrapping and/or starting at a random position to distribute the search time.

Additionally you can keep elements sorted while adding to the array probably which can then be searched much faster with a fitting algorithm.

Solution 3

Tweaking an extensive comparative test between

for numerical and string searches, by Kasim Kochkin posted on GitHub, I find the following results

using php 7.3.11

using array_flip once and multiple searches,

  • for single to few searches, in_array and array_search are faster.

  • for string searches, flip (once) + isset becomes faster above 200 searches.

  • for numerical searches, flip (once) + isset becomes faster above 10 searches.

results for String search (in seconds)

N (array size) in_array flip isset array_search array_key_exists
1,000,000 0.00845003 0.17343211 2.86E-6 0.00835395 5.01E-6
100,000 0.00854707 0.12469196 7.15E-6 0.00861216 6.2E-6
10,000 0.00854087 0.10549212 6.91E-6 0.00846505 4.05E-6

Numerical search results (in seconds),

N (array size) in_array flip isset array_search array_key_exists
1,000,000 0.01197696 0.06217289 6.2E-6 0.01673698 4.05E-6
100,000 0.01191092 0.06582093 6.91E-6 0.01637983 4.05E-6
10,000 0.01375008 0.07185006 5.01E-6 0.01485705 4.05E-6
Share:
15,431
castarco
Author by

castarco

I'm a mix of mathematician and computer scientist who loves to spend his time on programming.

Updated on July 19, 2022

Comments

  • castarco
    castarco almost 2 years

    I have a doubt about what's the better way to make a fast search in arrays (I'm talking about an specific case).

    Supose that I have an array L = [A, B, C] (when I start). While the program is running, may be L will grow (but by the end), one possible case when I'll do the search is that L = [A, B, C, D, E].

    The fact is that when I'm searching, the values that I want find could be only D and E. Now I'm using find_array(elem, array), but this function can't be "tweaked" to search starting at the end and decreasing the index, and I'm "afraid" that for all the searches the function in_array will examine all the elements with lower indexes before will find the value that I'm searching.

    ¿There is another search function wich fits better to my problem? ¿How works internally the in_array function?

    Thanks in advance

  • Jordan Arseno
    Jordan Arseno almost 11 years
  • Matthew
    Matthew almost 11 years
    @JordanArseno, yes isset() is faster than array_key_exists(), but it does return false for null values. (Not that it matters in this case.) That said, they are both essentially constant time, while in_array() is O(n) and performs noticeably worse as you reach the end of a large array. So I prefer to use isset() when null isn't an issue, but the main takeaway should be that in_array() is absolutely the wrong way to enforce uniqueness.
  • Basj
    Basj over 3 years
    Your post looks very useful @Aurovrata, but currently it's very hard to read. Could you format it with maybe a table which would show the benchmark in a more readable way?
  • Aurovrata
    Aurovrata over 3 years
    sure, I'll try to find some time to improve it
  • Basj
    Basj over 3 years
    Here is a Markdown table generator @Aurovrata: tablesgenerator.com/markdown_tables
  • Aurovrata
    Aurovrata over 3 years
    wow, that's really handy. I will update my answer leverage this table format
  • allan.simon
    allan.simon over 2 years
    something looks off, the time does not increase between 1 million and 10 thousand for in_array ?
  • Aurovrata
    Aurovrata over 2 years
    did you run it on your machine?