MYSQL OR vs IN performance

91,117

Solution 1

The accepted answer doesn't explain the reason.

Below are quoted from High Performance MySQL, 3rd Edition.

In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(Log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list (i.e., much slower for large lists)

Solution 2

I needed to know this for sure, so I benchmarked both methods. I consistenly found IN to be much faster than using OR.

Do not believe people who give their "opinion", science is all about testing and evidence.

I ran a loop of 1000x the equivalent queries (for consistency, I used sql_no_cache):

IN: 2.34969592094s

OR: 5.83781504631s

Update:
(I don't have the source code for the original test, as it was 6 years ago, though it returns a result in the same range as this test)

In request for some sample code to test this, here is the simplest possible use case. Using Eloquent for syntax simplicity, raw SQL equivalent executes the same.

$t = microtime(true); 
for($i=0; $i<10000; $i++):
$q = DB::table('users')->where('id',1)
    ->orWhere('id',2)
    ->orWhere('id',3)
    ->orWhere('id',4)
    ->orWhere('id',5)
    ->orWhere('id',6)
    ->orWhere('id',7)
    ->orWhere('id',8)
    ->orWhere('id',9)
    ->orWhere('id',10)
    ->orWhere('id',11)
    ->orWhere('id',12)
    ->orWhere('id',13)
    ->orWhere('id',14)
    ->orWhere('id',15)
    ->orWhere('id',16)
    ->orWhere('id',17)
    ->orWhere('id',18)
    ->orWhere('id',19)
    ->orWhere('id',20)->get();
endfor;
$t2 = microtime(true); 
echo $t."\n".$t2."\n".($t2-$t)."\n";

1482080514.3635
1482080517.3713
3.0078368186951

$t = microtime(true); 
for($i=0; $i<10000; $i++): 
$q = DB::table('users')->whereIn('id',[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])->get(); 
endfor; 
$t2 = microtime(true); 
echo $t."\n".$t2."\n".($t2-$t)."\n";

1482080534.0185
1482080536.178
2.1595389842987

Solution 3

I also did a test for future Googlers. Total count of returned results is 7264 out of 10000

SELECT * FROM item WHERE id = 1 OR id = 2 ... id = 10000

This query took 0.1239 seconds

SELECT * FROM item WHERE id IN (1,2,3,...10000)

This query took 0.0433 seconds

IN is 3 times faster than OR

Solution 4

I think the BETWEEN will be faster since it should be converted into:

Field >= 0 AND Field <= 5

It is my understanding that an IN will be converted to a bunch of OR statements anyway. The value of IN is the ease of use. (Saving on having to type each column name multiple times and also makes it easier to use with existing logic - you don't have to worry about AND/OR precedence because the IN is one statement. With a bunch of OR statements, you have to ensure you surround them with parentheses to make sure they are evaluated as one condition.)

The only real answer to your question is PROFILE YOUR QUERIES. Then you will know what works best in your particular situation.

Solution 5

It depends on what you are doing; how wide is the range, what is the data type (I know your example uses a numeric data type but your question can also apply to a lot of different data types).

This is an instance where you want to write the query both ways; get it working and then use EXPLAIN to figure out the execution differences.

I'm sure there is a concrete answer to this but this is how I would, practically speaking, figure out the answer for my given question.

This might be of some help: http://forge.mysql.com/wiki/Top10SQLPerformanceTips

Regards,
Frank

Share:
91,117

Related videos on Youtube

Scott
Author by

Scott

Updated on February 28, 2022

Comments

  • Scott
    Scott about 2 years

    I am wondering if there is any difference with regards to performance between the following

    SELECT ... FROM ... WHERE someFIELD IN(1,2,3,4)
    
    SELECT ... FROM ... WHERE someFIELD between 0 AND 5
    
    SELECT ... FROM ... WHERE someFIELD = 1 OR someFIELD = 2 OR someFIELD = 3 ... 
    

    or will MySQL optimize the SQL in the same way compilers optimize code?

    EDIT: Changed the AND's to OR's for the reason stated in the comments.

    • Jānis Gruzis
      Jānis Gruzis about 12 years
      Im also researching this thing, but in opposition for some statements that IN will be converted to row of ORs I could say that it can also be converted to UNIONs which is recomended for replacing OR`s to optimize query.
    • Rick James
      Rick James over 3 years
      There have been a few Optimization changes in this area, so some of the following answers may be "out of date".
    • Rick James
      Rick James almost 3 years
      In particular. The number of items may matter. How "clumped" the numbers are may matter (BETWEEN 1 AND 4 perfectly matches, and may be faster). The version of MySQL/MariaDB may matter.
  • Savageman
    Savageman about 14 years
    Statistically, Between has a chance to trigger the range index. IN() doesn't have this privilege. But yes, beach is right: you NEED to profile your request to know whether an index is used and which one. It's really hard to predict what the MySQL optimiser will choose.
  • dabest1
    dabest1 over 12 years
    What MySQL engine was it and did you clear MySQL buffers and OS file caches in between the two queries?
  • Jon z
    Jon z over 11 years
    This should be the selected answer.
  • ilasno
    ilasno over 11 years
    The link is stale - i think this may be the equivalent? wikis.oracle.com/pages/viewpage.action?pageId=27263381 (thanks Oracle ;-P)
  • eggyal
    eggyal over 11 years
    What indexes were used in these tests?
  • jorisw
    jorisw over 10 years
    On the equivalent page, it says: "Avoid using IN(...) when selecting on indexed fields, It will kill the performance of SELECT query." - Any idea why that is?
  • Ztyx
    Ztyx about 10 years
    "It is my understanding that an IN will be converted to a bunch of OR statements anyway." Where did you read this? I would expect it to put it in a hashmap to make O(1) lookups.
  • Timo002
    Timo002 about 10 years
    I was also optimizing queries and found out that the IN statement was about 30% faster than an OR.
  • RichardAtHome
    RichardAtHome over 9 years
    IN's being converted to OR's is how SQLServer handles it (or at least it did - might have changed now, haven't used it in years). I've been unable to find any evidence that MySQL does this.
  • gosukiwi
    gosukiwi over 9 years
    Wow, this is both awesome and lame. MySQL never ceases to amaze me with this kind of stuff.
  • Morgan Tocker
    Morgan Tocker almost 9 years
    This answer is correct, between is converted to "1 <= film_id <= 5". The other two solutions are not folded into a single range condition. I have a blog post which demonstrates this using OPTIMIZER TRACE here: tocker.ca/2015/05/25/…
  • elipoultorak
    elipoultorak over 8 years
    Do not believe people who give their "opinion" You're 100% right, Stack Overflow is unfortunately full of them
  • jave.web
    jave.web almost 8 years
    Performance reason (quoting MariaDB(a MySQL new free branch) docs): Returns 1 if expr is equal to any of the values in the IN list, else returns 0. If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search. This means IN is very quick if the IN value list consists entirely of constants . Otherwise, type conversion takes place according to the rules described at Type Conversion, but applied to all the arguments. => if your column is integer, pass integers to IN too...
  • Disillusioned
    Disillusioned over 7 years
    As a corollary to 'Do not believe people who give their "opinion"': Providing performance figures without including the scripts, tables and indexes used to obtain those figures makes them unverifiable. As such, the figures are as good as an "opinion".
  • Disillusioned
    Disillusioned over 7 years
    Your test is a narrow use-case. The query returns 72% of the data, and is unlikely to benefit from indexes.
  • Cyril Graze
    Cyril Graze over 7 years
    @CraigYoung, you could just go ahead and setup your own experiment and test. The results provided are definitely verifiable, just replicate the simple experiment described by running the query in a loop. One query using IN, another using OR. It's so simple, you don't actually need me to write the code for you, do you?
  • Cyril Graze
    Cyril Graze over 7 years
    @eggyal, this case the primary key. As mentioned by jave.web, IN will be faster for integer constants. I added a simple example to the answer showing the base loop for a benchmark. You could modify that test with more complex queries, and/or with non-constant indexes and see what results you get. For most straightforward cases, IN proves to be faster.
  • Rick James
    Rick James almost 7 years
    In newer versions of MySQL, OR is turned into IN. But this takes compile time effort. The runtime is then the same.
  • bishop
    bishop about 6 years
    I bet most of that time was consuming the query, parsing it, and query planning it. That's certainly a consideration: if you're going to have 10k OR statements, you're going to have a lot of redundant text just expressing it with OR: best to use the most compact expression possible.
  • Joshua Pinter
    Joshua Pinter over 4 years
    Fantastic reference to the specific database reason. Nice!
  • Joshua Pinter
    Joshua Pinter over 4 years
    NOTE: If you are doing anything expensive in the condition, such as using SUBSTRING_INDEX( name, '.', -1 ), then using IN becomes orders of magnitude faster than using LIKE because you only have to do the SUBSTRING_INDEX once and just compare it to the list of values, instead of running each time for every LIKE that you have.
  • Steve Jiang
    Steve Jiang over 4 years
    the url has expired
  • Rick James
    Rick James almost 4 years
    I, too, have heard that the list is sorted.