Why is array_key_exists 1000x slower than isset on referenced arrays?

11,833

Solution 1

At work I've got a VM instance of PHP that includes a PECL extension called VLD. This lets you execute PHP code from the commandline and rather than execute it, it returns the generated opcode instead.

It's brilliant at answering questions like this.

http://pecl.php.net/package/vld

Just in case you go this route (and if you're generally curious about how PHP works internally, i think you should) you should definitely install it on a virtual machine (that is, i wouldn't install it on a machine i'm trying to develop on or deploy to). And this is the command you'll use to make it sing:

php -d vld.execute=0 -d vld.active=1 -f foo.php

Looking at the opcodes will tell you a more complete story, however, I have a guess.... Most of PHP's built-ins make a copy of an Array/Object and act upon that copy (and not a copy-on-write either, an immediate copy). The most widely known example of this is foreach(). When you pass an array into foreach(), PHP is actually making a copy of that array and iterating on the copy. Whis is why you'll see a significant performance benefit by passing an array as a reference into foreach like this:

foreach($someReallyBigArray as $k => &$v)

But this behavior -- that passing in an explicit reference like that -- is unique to foreach(). So I would be very surprised if it made an array_key_exists() check any faster.

Ok, back to what I was getting at..

Most the built-ins take a copy of an array and act upon that copy. I am going to venture a completely unqualified guess that isset() is highly optimized and that one of those optimizations is perhaps to not do an immediate copy of an Array when its passed-in.

I'll try to answer any other questions you may have but you could probably read a lot of you google for "zval_struct" (which is the data structure in the PHP internals which stores each variable. It's a C struct (think.. an associative array) that has keys like "value", "type", "refcount".

Solution 2

Here is the source of the array_key_exists function for 5.2.17. You can see that even if the key is null, PHP attempts to compute a hash. Although it's interesting that if you remove

// $my_array_ref[$i] = NULL;

then it performs better. There must be multiple hash lookups occuring.

/* {{{ proto bool array_key_exists(mixed key, array search)
   Checks if the given key or index exists in the array */
PHP_FUNCTION(array_key_exists)
{
    zval **key,                 /* key to check for */
         **array;               /* array to check in */

    if (ZEND_NUM_ARGS() != 2 ||
        zend_get_parameters_ex(ZEND_NUM_ARGS(), &key, &array) == FAILURE) {
        WRONG_PARAM_COUNT;
    }

    if (Z_TYPE_PP(array) != IS_ARRAY && Z_TYPE_PP(array) != IS_OBJECT) {
        php_error_docref(NULL TSRMLS_CC, E_WARNING, "The second argument should be either an array or an object");
        RETURN_FALSE;
    }

    switch (Z_TYPE_PP(key)) {
        case IS_STRING:
            if (zend_symtable_exists(HASH_OF(*array), Z_STRVAL_PP(key), Z_STRLEN_PP(key)+1)) {
                RETURN_TRUE;
            }
            RETURN_FALSE;
        case IS_LONG:
            if (zend_hash_index_exists(HASH_OF(*array), Z_LVAL_PP(key))) {
                RETURN_TRUE;
            }
            RETURN_FALSE;
        case IS_NULL:
            if (zend_hash_exists(HASH_OF(*array), "", 1)) {
                RETURN_TRUE;
            }
            RETURN_FALSE;

        default:
            php_error_docref(NULL TSRMLS_CC, E_WARNING, "The first argument should be either a string or an integer");
            RETURN_FALSE;
    }

}

Solution 3

Not array_key_exists, but the removal of the reference (= NULL) causes this. I commented it out from your script and this is the result:

array_key_exists( $my_array ) 0.0059430599212646
isset( $my_array ) 0.0027170181274414
array_key_exists( $my_array_ref ) 0.0038740634918213
isset( $my_array_ref ) 0.0025200843811035

Only removed the unsetting from the array_key_exists( $my_array_ref ) part, this is the modified part for reference:

$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    array_key_exists( $i, $my_array_ref );
    // $my_array_ref[$i] = NULL;
}
$stop = microtime( TRUE );
print "array_key_exists( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );
Share:
11,833

Related videos on Youtube

Kendall Hopkins
Author by

Kendall Hopkins

Hire Me

Updated on February 24, 2022

Comments

  • Kendall Hopkins
    Kendall Hopkins about 2 years

    I have found that array_key_exists is over 1000x slower than isset at check if a key is set in an array reference. Does anyone that has an understanding of how PHP is implemented explain why this is true?

    EDIT: I've added another case that seems to point to it being overhead required in calling functions with a reference.

    Benchmark Example

    function isset_( $key, array $array )
    {
        return isset( $array[$key] );
    }
    
    $my_array = array();
    $start = microtime( TRUE );
    for( $i = 1; $i < 10000; $i++ ) {
        array_key_exists( $i, $my_array );
        $my_array[$i] = 0;
    }
    $stop = microtime( TRUE );
    print "array_key_exists( \$my_array ) ".($stop-$start).PHP_EOL;
    unset( $my_array, $my_array_ref, $start, $stop, $i );
    
    $my_array = array();
    $start = microtime( TRUE );
    for( $i = 1; $i < 10000; $i++ ) {
        isset( $my_array[$i] );
        $my_array[$i] = 0;
    }
    $stop = microtime( TRUE );
    print "isset( \$my_array ) ".($stop-$start).PHP_EOL;
    unset( $my_array, $my_array_ref, $start, $stop, $i );
    
    $my_array = array();
    $start = microtime( TRUE );
    for( $i = 1; $i < 10000; $i++ ) {
        isset_( $i, $my_array );
        $my_array[$i] = 0;
    }
    $stop = microtime( TRUE );
    print "isset_( \$my_array ) ".($stop-$start).PHP_EOL;
    unset( $my_array, $my_array_ref, $start, $stop, $i );
    
    $my_array = array();
    $my_array_ref = &$my_array;
    $start = microtime( TRUE );
    for( $i = 1; $i < 10000; $i++ ) {
        array_key_exists( $i, $my_array_ref );
        $my_array_ref[$i] = 0;
    }
    $stop = microtime( TRUE );
    print "array_key_exists( \$my_array_ref ) ".($stop-$start).PHP_EOL;
    unset( $my_array, $my_array_ref, $start, $stop, $i );
    
    $my_array = array();
    $my_array_ref = &$my_array;
    $start = microtime( TRUE );
    for( $i = 1; $i < 10000; $i++ ) {
        isset( $my_array_ref[$i] );
        $my_array_ref[$i] = 0;
    }
    $stop = microtime( TRUE );
    print "isset( \$my_array_ref ) ".($stop-$start).PHP_EOL;
    unset( $my_array, $my_array_ref, $start, $stop, $i );
    
    $my_array = array();
    $my_array_ref = &$my_array;
    $start = microtime( TRUE );
    for( $i = 1; $i < 10000; $i++ ) {
        isset_( $i, $my_array_ref );
        $my_array_ref[$i] = 0;
    }
    $stop = microtime( TRUE );
    print "isset_( \$my_array_ref ) ".($stop-$start).PHP_EOL;
    unset( $my_array, $my_array_ref, $start, $stop, $i );
    

    Output

    array_key_exists( $my_array ) 0.0056459903717
    isset( $my_array ) 0.00234198570251
    isset_( $my_array ) 0.00539588928223
    array_key_exists( $my_array_ref ) 3.64232587814 // <~ what on earth?
    isset( $my_array_ref ) 0.00222992897034
    isset_( $my_array_ref ) 4.12856411934 // <~ what on earth?
    

    I'm on PHP 5.3.6.

    Codepad example.

    • hakre
      hakre almost 13 years
      Search the internet for isset vs. array_key_exists and you will find more resources than you're probably able to read.
    • Kendall Hopkins
      Kendall Hopkins almost 13 years
      @hakre I understand the differences between isset and array_key_exists (how they work), my question is why this performance hit is happening when they are used on references.
    • hakre
      hakre almost 13 years
      array_key_exists must check for a concrete key value, isset must not. If you must look for the concrete value of a reference, the reference must be resolved additionally. So there is more work to do which takes more time. Why it's so much more time I can not tell you, but you're as well unsetting references, and this will influence the amount of work.
    • Kendall Hopkins
      Kendall Hopkins almost 13 years
      @hakre I've updated my example with testing function( $key, $array ) { return isset( $array[$key] ); }. It is also slow, this seems to point to overhead required in calling a function w/ a reference.
    • Shane H
      Shane H almost 13 years
      I don't think your test is really showing you what you think. Userspace functions (like the one you created) is always going to be significantly slower than in-built functions.
  • Kendall Hopkins
    Kendall Hopkins almost 13 years
    Then it's useless. The = NULL is just in place a larger value. I'll replace it w/ 0 in my example to make that more clear.
  • Kendall Hopkins
    Kendall Hopkins almost 13 years
    By removing that line, that function can probably return early (since there is no keys in the array). Besides it doesn't reflect the real-life problem I ran into (an ref array with keys in it).
  • hakre
    hakre almost 13 years
    Hmm, probably each time a reference changes, the hash is triggered to re-index the keys.
  • hakre
    hakre almost 13 years
    Hmm, why do you need a reference at all?
  • Kendall Hopkins
    Kendall Hopkins almost 13 years
    I think your right about the = <something> is doing something to kill the performance of the reference. I don't understand how it can require 1000x the work to do it though.
  • Kendall Hopkins
    Kendall Hopkins almost 13 years
    I found this bug while passing an array into a closure by reference because I wanted to change it in the closure. I simplified the case down so that closures wouldn't confuse people.
  • hakre
    hakre almost 13 years
    Look inside the PHP source code what is triggered by this. Then you probably find out. I do not know that, but I think this is related to memory management and copy on write plus the hash array structure.
  • Kendall Hopkins
    Kendall Hopkins almost 13 years
    I know there is ways to get around this, but I'm more interested in the WHY that this is happening, not how to fix it (though that would be nice also).
  • Kendall Hopkins
    Kendall Hopkins almost 13 years
    Hence why I was hoping with someone w/ knowledge of PHP internals would shed some light on this problem.
  • hakre
    hakre almost 13 years
    I think you can fix this by not using a reference. There is not much you can do else in an assignment on the PHP level I think, so no clue next to not using a reference for the how to fix. The why lies within the source, probably konforce knows that.
  • Kendall Hopkins
    Kendall Hopkins almost 13 years
    PHP references have been the source of much pain. I've had problems with segfaults, memory corruption and now this...
  • hakre
    hakre almost 13 years
    Why are you making use of those? You have not written much to that, normally those are not needed at all.
  • Kendall Hopkins
    Kendall Hopkins almost 13 years
    Sometimes they are nice to use when you need to temporally treat an array like an object or if you need to make an array reference itself. But I'll agree, most of the time there is a better way.
  • hakre
    hakre almost 13 years
    The part about temporarily treat an array like an object sounds interesting. How do you do that with references?
  • Kendall Hopkins
    Kendall Hopkins almost 13 years
    I just meant that arrays default to pass by value and objects pass by references and sometimes it's preferable to switch their behavior.
  • hakre
    hakre almost 13 years
    You should checkout ArrayObject and probably make use of the ArrayAccess interface. I think that's worth to try out. An object that behaves similar to an array.
  • Kendall Hopkins
    Kendall Hopkins almost 13 years
    I don't think this really answers the question. While that tool looks very interesting and I'll have to give it a try, you answer is a guess at best. BTW isset isn't a function, it's a language construct (why it probably doesn't have the overhead showing up here, until you wrap it in a function).
  • Shane H
    Shane H almost 13 years
    Kendall, this is only a "guess" because I didn't grep the php-src tree to answer your question. It's a very informed guess. I've lately worked on more PHP Extenstion development (in C) than PHP code itself. And your comment about "language construct"... You just got that from the PHP manual? But you seem to not understand what that really even means. All it means is that, in the C code, it's not implemented as a global ZEND_FUNCTION. The "overhead" for calling isset() is no different than the overhead of calling, say, "is_a" which IS a ZEND_FUNCTION.
  • Shane H
    Shane H almost 13 years
    Anyhow, believe what you'd like. You seem to have a hypothesis that you're attached to regardless of facts. And more than once you seem to think that a userspace function (that is, a function you create in PHP code) is somehow comparable to a language built-in. The only thing they have in common is that they're both named "function."
  • Kendall Hopkins
    Kendall Hopkins almost 13 years
    I don't think a few more lookups could cause a 1000x slow down. I'm pretty sure the problem is a bit more than just the array_key_exists function anyway. As you can see in the example, it happens for any function (see isset_).
  • Andras
    Andras over 9 years
    I think it's internal to zend_hash_index_exists() et al; accessing an unset element in general is slower than a set element. I've optimized production code in the past by testing first for truthy (!!$hash->$value) and if still not sure only then with array_key_exists(), it ran much faster.
  • bwoebi
    bwoebi almost 8 years
    the issue is the separation (aka duplication of the array…)