GROUP BY query optimization

11,782

Solution 1

The EXPLAIN verifies the (game, finish, user) index was used in the query. That seems like the best possible index to me. Could it be a hardware issue? What is your system RAM and CPU?

Solution 2

Get rid of 'game' key - it's redundant with 'i_gfu'. As 'id' is unique count(id) just returns number of rows in each group, so you can get rid of that and replace it with count(*). Try it that way and paste output of EXPLAIN:

SELECT user AS player, COUNT(*) AS times
FROM matches
WHERE finish = 1
AND game = 19
GROUP BY user
ORDER BY times DESC

Solution 3

Eh, tough. Try reordering your index: put the user column first (so make the index (user, finish, game)) as that increases the chance the GROUP BY can use the index. However, in general GROUP BY can only use indexes if you limit the aggregate functions used to MIN and MAX (see http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html and http://dev.mysql.com/doc/refman/5.5/en/loose-index-scan.html). Your order by isn't really helping either.

Solution 4

One of the shortcomings of this query is that you order by an aggregate. That means that you can't return any rows until the full result set has been generated; no index can exist (for mysql myisam, anyway) to fix that.

You can denormalize your data fairly easily to overcome this, though; You could, for instance, add an insert/update trigger to stick a count value in a summary table, with an index, so that you can start returning rows immediately.

Solution 5

As others have noted, you may have reached the limit of your ability to tune the query itself. You should next see what the setting of max_heap_table_size and tmp_table_size variables in your server. The default is 16MB, which may be too small for your table.

Share:
11,782
ypercubeᵀᴹ
Author by

ypercubeᵀᴹ

Interests: math, programming and games, not necessarily in that order. Favourite numbers: Surreals As of 2019 October 1st, I have stopped almost activity on the DBA site and the whole SE network. The reason is the abominable behaviour of SE towards its users (see Monica incident, etc). I have been adding the occasional comment since then but no questions, answers, edits or moderation activities. And no hats either. No, I do not hate them.

Updated on June 05, 2022

Comments

  • ypercubeᵀᴹ
    ypercubeᵀᴹ almost 2 years

    Database is MySQL with MyISAM engine.

    Table definition:

    CREATE TABLE IF NOT EXISTS  matches  (
       id  int(11) NOT NULL AUTO_INCREMENT,
       game  int(11) NOT NULL,
       user  int(11) NOT NULL,
       opponent  int(11) NOT NULL,
       tournament  int(11) NOT NULL,
       score  int(11) NOT NULL,
       finish  tinyint(4) NOT NULL,
      PRIMARY KEY ( id ),
      KEY  game  ( game ),
      KEY  user  ( user ),
      KEY  i_gfu ( game , finish , user )
    ) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=3149047 ;
    

    I have set an index on (game, finish, user) but this GROUP BY query still needs 0.4 - 0.6 seconds to run:

    SELECT user AS player
         , COUNT( id ) AS times
    FROM matches
    WHERE finish = 1
      AND game = 19
    GROUP BY user
    ORDER BY times DESC
    

    The EXPLAIN output:

    | id | select_type | table   | type | possible_keys | key   | key_len | 
    |  1 |  SIMPLE     | matches |  ref | game,i_gfu    | i_gfu |    5    | 
    
    |  ref        |   rows |   Extra                                      |
    | const,const | 155855 | Using where; Using temporary; Using filesort |
    

    Is there any way I can make it faster? The table has about 800K records.


    EDIT: I changed COUNT(id) into COUNT(*) and the time dropped to 0.08 - 0.12 seconds. I think I've tried that before making the index and forgot to change it again after.

    In the explain output the Using index explains the speeding up:

    |   rows |   Extra                                                   |
    | 168029 | Using where; Using index; Using temporary; Using filesort |
    

    (Side question: is this dropping of a factor of 5 normal?)

    There are about 2000 users, so the final sorting, even if it uses filesort, it doesn't hurt performance. I tried without ORDER BY and it still takes almost same time.

  • ypercubeᵀᴹ
    ypercubeᵀᴹ almost 13 years
    Extracting, yes. Sorting no, it doesn't spend time sorting.
  • Denis de Bernardy
    Denis de Bernardy almost 13 years
    That's not what your query plan is suggesting. Nor your query, for that matter. They both say at least one sort is needed. :-)
  • ypercubeᵀᴹ
    ypercubeᵀᴹ almost 13 years
    I've tried that index and also (user, game, finish) and forcing the use of it but it's even slower.
  • ypercubeᵀᴹ
    ypercubeᵀᴹ almost 13 years
    I mean, the time it spends in sorting is very short compared to the time spent on grouping.
  • Femi
    Femi almost 13 years
    Odd. I get the sense you're not going to be able to do better with the combination of GROUP BY and ORDER BY: you might want to create an explicit aggregate table if that query speed is too slow. The fact that Using filesort shows up indicates that the ORDER BY couldn't be done from any index: maybe try adding the id to the index?
  • Denis de Bernardy
    Denis de Bernardy almost 13 years
    I can't blame it for doing so, either... it's grouping many many rows (half of your table?) into 150k rows according to your query plan. :-)
  • ypercubeᵀᴹ
    ypercubeᵀᴹ almost 13 years
    Memory is 1GB. CPU is (i think) AMD Opteron Quad-core 3.5GHz.
  • Denis de Bernardy
    Denis de Bernardy almost 13 years
    In point of fact, I'm 99% sure you're wasting your time trying to optimize it: your current three-column index allows to go straight to the jugular, as in fetch the relevant rows and group them as is. They then need to be sorted, which also takes time. I very honestly see thing else that you can do. If anything, I'm actually surprised that the planner decides to use an index at all, since you're retrieving 20% of your table.
  • ic3b3rg
    ic3b3rg almost 13 years
    I would guess your bottleneck is the RAM. I would suggest bumping that to 4GB.
  • ypercubeᵀᴹ
    ypercubeᵀᴹ almost 13 years
    thnx for the advice, both settings are at 64M.
  • matt
    matt almost 13 years
    4Gb to process table with 900k rows ~30 bytes each? ;) That's not even 30 mbytes;)
  • ypercubeᵀᴹ
    ypercubeᵀᴹ almost 13 years
    You mean a (game, finish, user, id) index?
  • ic3b3rg
    ic3b3rg almost 13 years
    @lucek Your math is correct but OS overhead eats up a lot of RAM these days. Also any other running applications will be consuming RAM. 4GB is pretty much standard these days.
  • ypercubeᵀᴹ
    ypercubeᵀᴹ almost 13 years
    @lucek and @ic3b3rg: For the record, the table has other fields too. Total size is about 80MB. But the machine is used as a MySQL server only.
  • Femi
    Femi almost 13 years
    Well, I'd have said try that on for size, but if using COUNT(*) helped then that probably won't do much good.
  • ic3b3rg
    ic3b3rg almost 13 years
    @ypercube there may be a software-based suggestion here that will speed things up for you. Your table, index and SQL structure seem to be fine to me, so I doubt any tweaks there will help. The suggestion about server variables by @Thomas Jones-Low might help. If nothing seems to help, a few extra GBs of RAM is pretty cheap.