Why does MySQL not use an index for a greater than comparison?

14,920

Solution 1

MyISAM tables are not clustered, a PRIMARY KEY index is a secondary index and requires an additional table lookup to get the other values.

It is several times more expensive to traverse the index and do the lookups. If you condition is not very selective (yields a large share of total records), MySQL will consider table scan cheaper.

To prevent it from doing a table scan, you could add a hint:

SELECT  *
FROM    userapplication FORCE INDEX (PRIMARY)
WHERE   application_id > 1025

, though it would not necessarily be more efficient.

Solution 2

You'd probably be better off letting MySql decide on the query plan. There is a good chance that doing an index scan would be less efficient than a full table scan.

There are two data structures on disk for this table

  1. The table itself; and
  2. The primary key B-Tree index.

When you run a query the optimizer has two options about how to access the data:

SELECT * FROM userapplication WHERE application_id > 1025;

Using The Index

  1. Scan the B-Tree index to find the address of all the rows where application_id > 1025
  2. Read the appropriate pages of the table to get the data for these rows.

Not using the Index

Scan the entire table, and pick the appropriate records.

Choosing the best stratergy

The job of the query optimizer is to choose the most efficient strategy for getting the data you want. If there are a lot of rows with an application_id > 1025 then it can actually be less efficient to use the index. For example if 90% of the records have an application_id > 1025 then the query optimizer would have to scan around 90% of the leaf nodes of the b-tree index and then read at least 90% of the table as well to get the actual data; this would involve reading more data from disk than just scanning the table.

Solution 3

Mysql definitely considers a full table scan cheaper than using the index; you can however force to use your primary key as preferred index with:

mysql> EXPLAIN SELECT * FROM userapplication FORCE INDEX (PRIMARY) WHERE application_id > 10;

+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table           | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | userapplication | range | PRIMARY       | PRIMARY | 4       | NULL |   24 | Using where |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+


Note that using "USE INDEX" instead of "FORCE INDEX" to only hint mysql on the index to use, mysql still prefers a full table scan:

mysql> EXPLAIN SELECT * FROM userapplication USE INDEX (PRIMARY) WHERE application_id > 10;
+----+-------------+-----------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table           | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-----------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | userapplication | ALL  | PRIMARY       | NULL | NULL    | NULL |   34 | Using where |
+----+-------------+-----------------+------+---------------+------+---------+------+------+-------------+

Share:
14,920

Related videos on Youtube

Robin
Author by

Robin

Updated on June 04, 2022

Comments

  • Robin
    Robin over 1 year

    I am trying to optimize a bigger query and ran into this wall when I realized this part of the query was doing a full table scan, which in my mind does not make sense considering the field in question is a primary key. I would assume that the MySQL Optimizer would use the index.

    Here is the table:

    
    CREATE TABLE userapplication (
      application_id int(11) NOT NULL auto_increment,
      userid int(11) NOT NULL default '0',
      accountid int(11) NOT NULL default '0',
      resume_id int(11) NOT NULL default '0',
      coverletter_id int(11) NOT NULL default '0',
      user_email varchar(100) NOT NULL default '',
      account_name varchar(200) NOT NULL default '',
      resume_name varchar(255) NOT NULL default '',
      resume_modified datetime NOT NULL default '0000-00-00 00:00:00',
      cover_name varchar(255) NOT NULL default '',
      cover_modified datetime NOT NULL default '0000-00-00 00:00:00',
      application_status tinyint(4) NOT NULL default '0',
      application_created datetime NOT NULL default '0000-00-00 00:00:00',
      application_modified timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
      publishid int(11) NOT NULL default '0',
      application_visible int(11) default '1',
      PRIMARY KEY  (application_id),
      KEY publishid (publishid),
      KEY application_status (application_status),
      KEY userid (userid),
      KEY accountid (accountid),
      KEY application_created (application_created),
      KEY resume_id (resume_id),
      KEY coverletter_id (coverletter_id),
     ) ENGINE=MyISAM ;
    

    This simple query seems to do a full table scan:

    SELECT * FROM userapplication WHERE application_id > 1025;

    This is the output of the EXPLAIN:

    +----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+
    | id | select_type | table             | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
    +----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+
    |  1 | SIMPLE      | userapplication | ALL  | PRIMARY       | NULL | NULL    | NULL | 784422 | Using where |
    +----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+`
    

    Any ideas how to prevent this simple query from doing a full table scan? Or am I out of luck?

  • hangy
    hangy almost 13 years
    Since application_id is an int(11) the opposite should be true.
  • Kajetan Abt
    Kajetan Abt almost 13 years
    The link explicitly explains that putting quotes around a non-text field is wrong, actually.
  • Quassnoi
    Quassnoi almost 13 years
    @hangy: in MySQL it will work the both ways, since it would cast the string to integer, not vice versa.
  • Robin
    Robin almost 13 years
    application_id is an INT(11), adding quotes treats it like a text field, which will not optimize.
  • Robin
    Robin almost 13 years
    Do you have a reference? I would be interested in further reading
  • Quassnoi
    Quassnoi almost 13 years
    @Robin: whether a constant number is enclosed into quotes or not does not matter in this case.
  • Robin
    Robin almost 13 years
    From Andy's explanation below I suspect you are correct in that the full table scan is more efficient in this case. The logic did not jive with me, although I do have a better understanding of the internals of the optimizer.
  • Madbreaks
    Madbreaks about 9 years
    Concise and very well explained! +1
  • Vincent Jia
    Vincent Jia almost 5 years
    I want to ask why not use 'FORCE INDEX'. If we let query optimizer choose by itself, it costs 43s in one case (just say my one local test); but if use 'FORCE INDEX', it just cost 12s. So, it seems INDEX does help a lot.
  • Andrew Skirrow
    Andrew Skirrow almost 5 years
    @VIncentJia Not sure where you're getting the 43s vs 12s from?
  • Vincent Jia
    Vincent Jia almost 5 years
    @AndySkirrow sorry for confuse. I didn't post my detail test steps/results here. I just say the big difference after/before using 'FORCE INDEX'. In brief, my test case is , create 730K rows, then try to delete 580K rows by 'delete from t_event_source_archive_73w where updated_time > '2019-01-17 18:00:00';'; it costs 43.196s without 'FORCE INDEX', but only costs 12.440s with 'delete from t_event_source_archive_73w using t_event_source_archive_73w force index(idx_update_time) where updated_time > '2019-01-17 18:00:00';' .
  • Andrew Skirrow
    Andrew Skirrow almost 5 years
    @VincentJia SQL likely to be is choosing not to use the index for the reasons outlined in my answer, but it's not clear why SQL is making the wrong decision. it's difficult to investigate exactly what's happening in your case using comments. I'd recommend posting as a separate question with a minimal reproducible test case.
  • haneulkim
    haneulkim over 3 years
    Same thing is happening with me and I am using innodb, version 5.7.24. So MySQL is deciding that full table scan is faster? but it may be wrong?
  • Quassnoi
    Quassnoi over 3 years
    @Ambleu: please post your setup and query in a separate question