Does Order of Fields of Multi-Column Index in MySQL Matter

20,028

Solution 1

When discussing multi-column indexes, I use an analogy to a telephone book. A telephone book is basically an index on last name, then first name. So the sort order is determined by which "column" is first. Searches fall into a few categories:

  1. If you look up people whose last name is Smith, you can find them easily because the book is sorted by last name.

  2. If you look up people whose first name is John, the telephone book doesn't help because the Johns are scattered throughout the book. You have to scan the whole telephone book to find them all.

  3. If you look up people with a specific last name Smith and a specific first name John, the book helps because you find the Smiths sorted together, and within that group of Smiths, the Johns are also found in sorted order.

If you had a telephone book sorted by first name then by last name, the sorting of the book would assist you in the above cases #2 and #3, but not case #1.

That explains cases for looking up exact values, but what if you're looking up by ranges of values? Say you wanted to find all people whose first name is John and whose last name begins with 'S' (Smith, Saunders, Staunton, Sherman, etc.). The Johns are sorted under 'J' within each last name, but if you want all Johns for all last names starting with 'S', the Johns are not grouped together. They're scattered again, so you end up having to scan through all the names with last name starting with 'S'. Whereas if the telephone book were organized by first name then by last name, you'd find all the Johns together, then within the Johns, all the 'S' last names would be grouped together.

So the order of columns in a multi-column index definitely matters. One type of query may need a certain column order for the index. If you have several types of queries, you might need several indexes to help them, with columns in different orders.

You can read my presentation How to Design Indexes, Really for more information.

Solution 2

The two indexes are different. This is true in MySQL and in other databases. MySQL does a pretty good job of explaining the different in the documentation.

Consider the two indexes:

create index idx_lf on name(last_name, first_name);
create index idx_fl on name(first_name, last_name);

Both of these should work equally well on:

where last_name = XXX and first_name = YYY

idx_lf will be optimal for the following conditions:

where last_name = XXX
where last_name like 'X%'
where last_name = XXX and first_name like 'Y%'
where last_name = XXX order by first_name

idx_fl will be optimal for the following:

where first_name = YYY
where first_name like 'Y%'
where first_name = YYY and last_name like 'X%'
where first_name = XXX order by last_name

For many of these cases, both indexes could possibly be used, but one is optimal. For instance, consider idx_lf with the query:

where first_name = XXX order by last_name

MySQL could read the entire table using idx_lf and then do the filtering after the order by. I don't think this is an optimization option in practice (for MySQL), but that can happen in other databases.

Solution 3

The general rule is that in a multi-column index, you want to put the most selective -- that is, the one that will give you fewest results -- first. So if you are creating a multiple-column index on a table with a status column of say 10 possible values, and also a dateAdded column, and you're typically writing queries like

SELECT * FROM myTable WHERE status='active' and dateAdded='2010-10-01'

...then you'd want dateAdded first, because that would limit the scan to just a few rows rather than 10% (or whatever proportion are 'active') of your rows.

This takes a fair bit of thought and tuning; you should check out the Lahdenmaki and Leach book.

Share:
20,028
James Oravec
Author by

James Oravec

I like egg rolls.

Updated on December 05, 2020

Comments

  • James Oravec
    James Oravec over 3 years

    I know the importance of indexes and how order of joins can change performance. I've done a bunch of reading related to multi-column indexes and haven't found the answer to my question.

    I'm curious if I do a multi-column index, if the order that they are specified matters at all. My guess is that it would not, and that the engine would treat them as a group, where ordering doesn't matter. But I wish to verify.

    For example, from mysql's website (http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html)

    CREATE TABLE test (
        id         INT NOT NULL,
        last_name  CHAR(30) NOT NULL,
        first_name CHAR(30) NOT NULL,
        PRIMARY KEY (id),
        INDEX name (last_name,first_name)
    );
    

    Would there be any benifit in any cases where the following would be better, or is it equivalent?

    CREATE TABLE test (
        id         INT NOT NULL,
        last_name  CHAR(30) NOT NULL,
        first_name CHAR(30) NOT NULL,
        PRIMARY KEY (id),
        INDEX name (first_name,last_name)
    );
    

    Specificially:

    INDEX name (last_name,first_name)
    

    vs

    INDEX name (first_name,last_name)
    
  • James Oravec
    James Oravec almost 10 years
    I agree with the order when doing selects, but my question is related to the order of the columns in the multi-column index.
  • Graham Charles
    Graham Charles almost 10 years
    No, I was talking about index creation, too. But you create indexes to support the kinds of queries your database will be tasked with -- hence the example.
  • Graham Charles
    Graham Charles almost 10 years
    In your specific example, it's likely that column order in the index wouldn't matter in most places, because First Name and Last Name distribution are roughly equal. But if you're in, say, Vietnam, where there are very few different surnames, then you'd want First Name first in the index.
  • Phil DD
    Phil DD about 8 years
    +1 for the link to the MySQL docs and your example further reinforcing that the order matters in the context of the queries you are making.
  • Pascal Klein
    Pascal Klein over 7 years
    I like the telepone book analogy a lot
  • CMCDragonkai
    CMCDragonkai about 7 years
    Would a multicolumn index help a multicolumn order-by sort? Or does it only help mutlicolumn where constraints?
  • Bill Karwin
    Bill Karwin about 7 years
    @CMCDragonkai, again, think of the telephone book analogy. It is sorted by a multi-column key: lastname, firstname. If you do a query asking for data with ORDER BY lastname, firstname, then the query optimizer says "Hey! It's already stored in that order! I can just read it in its natural order and send it to the user, I don't have to re-sort it!"
  • CMCDragonkai
    CMCDragonkai about 7 years
    But it wouldn't work if the ordering was ASC and DESC or DESC and ASC right? It would only work for ASC and ASC or DESC and DESC.
  • Bill Karwin
    Bill Karwin about 7 years
    @CMCDragonkai, Yes, that's an issue. Well done for making that connection so quickly, by the way. Many developers would not have been able to anticipate that. MySQL 8.0 is developing a feature to deal with that. When you create the index, you can declare which columns are ascending and which are descending. Then later if you search with the same combination of ASC and DESC matching the "direction" of the columns in that index, the query can be optimized with that index. See mysqlserverteam.com/mysql-8-0-labs-descending-indexes-in-mys‌​ql
  • Rohanil
    Rohanil almost 5 years
    How does order of multiple-column index affect in Inner Join queries like Inner Join x on x.FieldA = some_col and x.FieldB = other_col? I have observed a query where the performance is dependant on order of indexes i.e. using Index(FieldA, FieldB) was slower than using Index(FieldB, FieldA). What could be the reason?
  • Bill Karwin
    Bill Karwin almost 5 years
    In general, MySQL only uses one index per table. If you told it to USE INDEX(FieldA, FieldB) that does not mean use both, it means pick one of those two and ignore all other indexes.
  • Oleg
    Oleg over 4 years
    Thank you for order by mention
  • Jack Zhu
    Jack Zhu over 3 years
    the order of column conditions does not matter in where clause, your case is totally wrong
  • Graham Charles
    Graham Charles over 3 years
    I'll repeat, or re-repeat. I was not discussing column order in the WHERE clause. (The parser doesn't care.) Column order a multi-column index does matter.
  • FBB
    FBB about 3 years
    @Rohanil the reason is given in another answer below: stackoverflow.com/a/24315330/1768736
  • yasir shah
    yasir shah almost 2 years
    @BillKarwin you explain each and every think very good. but i have some type of confusion about name searching what if i search with (first name or last name) combination and some time search with (last and first name) combination and also search with (first middle or last) combination name. so what can i make three different combination indexes or can only one combination work for three different search scenario?
  • Bill Karwin
    Bill Karwin almost 2 years
    @yasirshah, You may need multiple indexes to support different query forms. Also, a query for (firstname=? OR lastname=?) is a problem. See my answer on this question: stackoverflow.com/questions/13750475/…