Mysql performance on 6 million row table

36,168

Solution 1

What you want to make sure is that the query will use ONLY the index, so make sure that the index covers all the fields you are selecting. Also, since it is a range query involved, You need to have the venid first in the index, since it is queried as a constant. I would therefore create and index like so:

ALTER TABLE events ADD INDEX indexNameHere (venid, date, time);

With this index, all the information that is needed to complete the query is in the index. This means that, hopefully, the storage engine is able to fetch the information without actually seeking inside the table itself. However, MyISAM might not be able to do this, since it doesn't store the data in the leaves of the indexes, so you might not get the speed increase you desire. If that's the case, try to create a copy of the table, and use the InnoDB engine on the copy. Repeat the same steps there and see if you get a significant speed increase. InnoDB does store the field values in the index leaves, and allow covering indexes.

Now, hopefully you'll see the following when you explain the query:

mysql> EXPLAIN SELECT date, time FROM events WHERE venid='47975' AND date>='2009-07-11' ORDER BY date;

id  select_type table  type  possible_keys        key       [..]  Extra
1   SIMPLE   events range date_idx, indexNameHere indexNameHere   Using index, Using where

Solution 2

I would imagine that a 6M row table should be able to be optimised with quite normal techniques.

I assume that you have a dedicated database server, and it has a sensible amount of ram (say 8G minimum).

You will want to ensure you've tuned mysql to use your ram efficiently. If you're running a 32-bit OS, don't. If you are using MyISAM, tune your key buffer to use a signficiant proportion, but not too much, of your ram.

In any case you want to run repeated performance testing on production-grade hardware.

Solution 3

Try adding a key that spans venid and date (or the other way around, or both...)

Solution 4

Try putting an index on the venid column.

Share:
36,168
pedalpete
Author by

pedalpete

Originally from Whistler, Canada, now living in Bondi Beach, Aus. I like building interesting things, algorithms, UX/UI, getting into hardware and RaspberryPi.

Updated on July 09, 2022

Comments

  • pedalpete
    pedalpete almost 2 years

    One day I suspect I'll have to learn hadoop and transfer all this data to a non-structured database, but I'm surprised to find the performance degrade so significantly in such a short period of time.

    I have a mysql table with just under 6 million rows. I am doing a very simple query on this table, and believe I have all the correct indexes in place.

    the query is

    SELECT date, time FROM events WHERE venid='47975' AND date>='2009-07-11' ORDER BY date
    

    the explain returns

    id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
    1   SIMPLE  updateshows     range   date_idx    date_idx    7   NULL    648997  Using where
    

    so i am using the correct index as far as I can tell, but this query is taking 11 seconds to run.

    The database is MyISAM, and phpMyAdmin says the table is 1.0GiB.

    Any ideas here?

    Edited: The date_idx is indexes both the date and venid columns. Should those be two seperate indexes?

  • pedalpete
    pedalpete almost 15 years
    When you say 'add a key', do you mean an index? I edited my entry to state that the date_idx is on both the date and venid fields.
  • Michael Haren
    Michael Haren almost 15 years
    +1: covering indexes are essential. With careful indexes and careful queries, 6mm rows is no big deal.
  • pedalpete
    pedalpete almost 15 years
    AWESOME!! thank you. I didn't realize that I needed to cover the SELECTED fields with the index. I thought it was just the WHERE fields which needed to be indexed.
  • pedalpete
    pedalpete almost 15 years
    Thanks Michael, I didn't realize that the SELECT fields should be indexed too. Cheers.
  • Justin
    Justin about 12 years
    if you remember, what was the execution time on the new query with the index?
  • David Bélanger
    David Bélanger almost 12 years
    @pedalpete I ask the same question as Justin.
  • Franklin
    Franklin almost 12 years
    What needs to be done when you have a COUNT(*) in the select clause?
  • Franklin
    Franklin almost 12 years
    Doesn't having SELECT fields also on the index make the system more rigid. Any new projections will have to be added in the index. Is this the right way to go about?
  • pedalpete
    pedalpete over 10 years
    Sorry for the late reply @JustinKrause (and others), your comment did come in a few years after the initial question. After fixing up the indexes, the query time came to just under 0.4 seconds I believe. It was AMAZING how fast it was, and it wasn't on a dedicated server either. It was a medium sized hosted box, at the time, nothing huge. I can't remember if it was linode or I switched to linode shortly after.
  • pedalpete
    pedalpete over 10 years
    Thanks @MarkR, and sorry for the very late reply. This was the second website I had ever built, so had no idea of dedicated db servers or anything like that. I ran it for a few years with all processes on the same box. No issues, I was amazed how well MySQL scaled to 8 million+ rows. I'd archive older data when it reached that point.
  • codefreaK
    codefreaK about 10 years
    Hey i have half a million rows now and i and by the end of year it will be six million a inner join for summing results in 2.345 secs on avg I have added the index exactly as above no change what to do
  • DarbyM
    DarbyM over 5 years
    @pedalpete 4 years later and your late wrap up to post your results is STILL being helpful!