GROUP BY query optimization
Solution 1
The EXPLAIN verifies the (game, finish, user)
index was used in the query. That seems like the best possible index to me. Could it be a hardware issue? What is your system RAM and CPU?
Solution 2
Get rid of 'game' key - it's redundant with 'i_gfu'. As 'id' is unique count(id) just returns number of rows in each group, so you can get rid of that and replace it with count(*). Try it that way and paste output of EXPLAIN:
SELECT user AS player, COUNT(*) AS times
FROM matches
WHERE finish = 1
AND game = 19
GROUP BY user
ORDER BY times DESC
Solution 3
Eh, tough. Try reordering your index: put the user
column first (so make the index (user, finish, game)
) as that increases the chance the GROUP BY can use the index. However, in general GROUP BY can only use indexes if you limit the aggregate functions used to MIN and MAX (see http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html and http://dev.mysql.com/doc/refman/5.5/en/loose-index-scan.html). Your order by isn't really helping either.
Solution 4
One of the shortcomings of this query is that you order by an aggregate. That means that you can't return any rows until the full result set has been generated; no index can exist (for mysql myisam, anyway) to fix that.
You can denormalize your data fairly easily to overcome this, though; You could, for instance, add an insert/update trigger to stick a count value in a summary table, with an index, so that you can start returning rows immediately.
Solution 5
As others have noted, you may have reached the limit of your ability to tune the query itself. You should next see what the setting of max_heap_table_size
and tmp_table_size
variables in your server. The default is 16MB, which may be too small for your table.
ypercubeᵀᴹ
Interests: math, programming and games, not necessarily in that order. Favourite numbers: Surreals As of 2019 October 1st, I have stopped almost activity on the DBA site and the whole SE network. The reason is the abominable behaviour of SE towards its users (see Monica incident, etc). I have been adding the occasional comment since then but no questions, answers, edits or moderation activities. And no hats either. No, I do not hate them.
Updated on June 05, 2022Comments
-
ypercubeᵀᴹ almost 2 years
Database is MySQL with MyISAM engine.
Table definition:
CREATE TABLE IF NOT EXISTS matches ( id int(11) NOT NULL AUTO_INCREMENT, game int(11) NOT NULL, user int(11) NOT NULL, opponent int(11) NOT NULL, tournament int(11) NOT NULL, score int(11) NOT NULL, finish tinyint(4) NOT NULL, PRIMARY KEY ( id ), KEY game ( game ), KEY user ( user ), KEY i_gfu ( game , finish , user ) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3149047 ;
I have set an index on
(game, finish, user)
but thisGROUP BY
query still needs 0.4 - 0.6 seconds to run:SELECT user AS player , COUNT( id ) AS times FROM matches WHERE finish = 1 AND game = 19 GROUP BY user ORDER BY times DESC
The
EXPLAIN
output:| id | select_type | table | type | possible_keys | key | key_len | | 1 | SIMPLE | matches | ref | game,i_gfu | i_gfu | 5 | | ref | rows | Extra | | const,const | 155855 | Using where; Using temporary; Using filesort |
Is there any way I can make it faster? The table has about 800K records.
EDIT: I changed
COUNT(id)
intoCOUNT(*)
and the time dropped to 0.08 - 0.12 seconds. I think I've tried that before making the index and forgot to change it again after.In the explain output the Using index explains the speeding up:
| rows | Extra | | 168029 | Using where; Using index; Using temporary; Using filesort |
(Side question: is this dropping of a factor of 5 normal?)
There are about 2000 users, so the final sorting, even if it uses filesort, it doesn't hurt performance. I tried without
ORDER BY
and it still takes almost same time. -
ypercubeᵀᴹ almost 13 yearsExtracting, yes. Sorting no, it doesn't spend time sorting.
-
Denis de Bernardy almost 13 yearsThat's not what your query plan is suggesting. Nor your query, for that matter. They both say at least one sort is needed. :-)
-
ypercubeᵀᴹ almost 13 yearsI've tried that index and also
(user, game, finish)
and forcing the use of it but it's even slower. -
ypercubeᵀᴹ almost 13 yearsI mean, the time it spends in sorting is very short compared to the time spent on grouping.
-
Femi almost 13 yearsOdd. I get the sense you're not going to be able to do better with the combination of GROUP BY and ORDER BY: you might want to create an explicit aggregate table if that query speed is too slow. The fact that Using filesort shows up indicates that the ORDER BY couldn't be done from any index: maybe try adding the
id
to the index? -
Denis de Bernardy almost 13 yearsI can't blame it for doing so, either... it's grouping many many rows (half of your table?) into 150k rows according to your query plan. :-)
-
ypercubeᵀᴹ almost 13 yearsMemory is 1GB. CPU is (i think) AMD Opteron Quad-core 3.5GHz.
-
Denis de Bernardy almost 13 yearsIn point of fact, I'm 99% sure you're wasting your time trying to optimize it: your current three-column index allows to go straight to the jugular, as in fetch the relevant rows and group them as is. They then need to be sorted, which also takes time. I very honestly see thing else that you can do. If anything, I'm actually surprised that the planner decides to use an index at all, since you're retrieving 20% of your table.
-
ic3b3rg almost 13 yearsI would guess your bottleneck is the RAM. I would suggest bumping that to 4GB.
-
ypercubeᵀᴹ almost 13 yearsthnx for the advice, both settings are at 64M.
-
matt almost 13 years4Gb to process table with 900k rows ~30 bytes each? ;) That's not even 30 mbytes;)
-
ypercubeᵀᴹ almost 13 yearsYou mean a
(game, finish, user, id)
index? -
ic3b3rg almost 13 years@lucek Your math is correct but OS overhead eats up a lot of RAM these days. Also any other running applications will be consuming RAM. 4GB is pretty much standard these days.
-
ypercubeᵀᴹ almost 13 years@lucek and @ic3b3rg: For the record, the table has other fields too. Total size is about 80MB. But the machine is used as a MySQL server only.
-
Femi almost 13 yearsWell, I'd have said try that on for size, but if using
COUNT(*)
helped then that probably won't do much good. -
ic3b3rg almost 13 years@ypercube there may be a software-based suggestion here that will speed things up for you. Your table, index and SQL structure seem to be fine to me, so I doubt any tweaks there will help. The suggestion about server variables by @Thomas Jones-Low might help. If nothing seems to help, a few extra GBs of RAM is pretty cheap.