MySQL SUM Query is extremely slow
Solution 1
(Yes, I am adding another answer. Justification: It addresses the underlying problem a different way.)
The underlying problem seems to be that there is an ever-growing "transaction" table from which is derived various statistics, such as SUM(amount)
. The performance of this will only get worse and worse as the table(s) grow.
The basis for this Answer will be to look at the data in two ways: "History" and "Current". Transactions
is the History. A new table would be the Current
totals for each User. But I see multiple ways to do that. Each involves some form of subtotal(s) so as to avoid adding 773K rows to get the answer.
- The traditional banking way... Each night tally up the day's
Transactions
and add them toCurrent
. - The Materialized View way... Each time a row is added to
Transactions
, incrementCurrent
. - Hybrid: Keep daily subtotals in a "Summary Table". Sum those subtotals to get the
SUM
through last night.
More discussion in my blog on Summary Tables.
Note that the up-to-the-second balance for the banking or hybrid way is a little tricky:
- Get last night's amount
- Add any Transactions that occurred during the day.
Any of the approaches will be a lot faster than scanning all 773K rows for the user, but it will be more complex code.
Solution 2
(Sorry to step on all the good comments. I hope I have added enough to justify claiming an "Answer".)
Are there 6M rows in the table? But 773K rows with that user_id
?
9GB buffer_pool? The table is about 4GB of data? So it fits in the buffer_pool if there is not much else to bump it out. (SHOW TABLE STATUS
and check "Data_length".)
The existing INDEX(user_id)
might be 20MB, easily cachable.
If the user_ids are sufficiently scattered around the table, the query probably needs to fetch virtually every 16KB block of the data. So, the original query with original index will go something like:
- scan the index for the given
user_id
. This will be a minor part of the total effort. - For each entry in the index, look up (randomly) the record. This happens 1.5M times. With a "cold" cache, this would easily take 38 seconds or more. Where the "slow" times soon after a restart? Or something else that blows out the cache? With a "warm" cache, it is all CPU (no I/O), hence 1 second is reasonable.
If you change to the optimal, "covering", INDEX(user_id, confirmed, amount)
, things change some...
- "Covering" means that the entire query would be performed in the index. (This composite index might be more like 40MB, but this is still very small compared to the data.)
- In a "cold" cache, only 40MB would need fetching -- expect much better than 38s.
- In a "warm" cache (this time only the 40MB), it might run in half a second.
If there were also a date range in the WHERE
clause, I would push for building and maintaining a "Summary table". This might speed up similar queries by a factor of 10.
If you do add a composite index starting with user_id
, you should (not must) DROP
the index on just user_id as being redundant. (If you don't drop it, it will mostly waste disk space.)
As for doing it in production...
- If you have a new enough version of MySQL,
ALTER TABLE ... ALGORITHM=INPLACE ...
, which is feasible for adding/dropping indexes with minimal impact. - For older versions, see
pt-online-schema-change
. It requires that there be no other Triggers, and does take a very short downtime. The Trigger takes care of the 200 writes/minute 'transparently'.
ALGORITHM=INPLACE
was added in MySQL 5.6 and MariaDB 10.0.
Solution 3
One thing you might try is adding an composite index to see if it speeds up the select part of the query:
ALTER TABLE `transactions` ADD INDEX `user_confirmed` (`user_id`, `confirmed`);
Also, as @wajeeh pointed out in a comment, the LIMIT
clause is unnecessary here since you're already calling an aggregate function.
It would be helpful if you can post the table schema in your question as well.
Comments
-
Hamed Kamrava almost 2 years
There is a table called
transactions
with ~6 million rows. Below query counts the current user balance. Here is the log after I enabledslow_query_log = 'ON'
:# Time: 170406 9:51:48 # User@Host: root[root] @ [xx.xx.xx.xx] # Thread_id: 13 Schema: main_db QC_hit: No # Query_time: 38.924823 Lock_time: 0.000034 Rows_sent: 1 Rows_examined: 773550 # Rows_affected: 0 SET timestamp=1491456108; SELECT SUM(`Transaction`.`amount`) as total FROM `main_db`.`transactions` AS `Transaction` WHERE `Transaction`.`user_id` = 1008 AND `Transaction`.`confirmed` = 1 LIMIT 1;
As you can see it took
~38 seconds
!Here is
transactions
table EXPLAIN :This query sometimes run fast (about ~1 second) and sometimes really slow!
Any help would be great appreciated.
P.S:
It's InnoDB and
transactions
table has frequent INSERT and SELECT operations.I tried running the query with
SQL_NO_CACHE
, but it is still sometimes fast, sometimes slow.transactions
Table Schema :CREATE TABLE `transactions` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `user_id` int(10) unsigned NOT NULL, `ref_id` varchar(40) COLLATE utf8_persian_ci NOT NULL, `payment_id` tinyint(3) unsigned NOT NULL, `amount` decimal(10,1) NOT NULL, `created` datetime NOT NULL, `private_note` varchar(6000) COLLATE utf8_persian_ci NOT NULL, `public_note` varchar(200) COLLATE utf8_persian_ci NOT NULL, `confirmed` tinyint(3) NOT NULL, PRIMARY KEY (`id`), KEY `user_id` (`user_id`) ) ENGINE=InnoDB AUTO_INCREMENT=13133663 DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci
MySQL is running on a VPS with 12GB RAM and 9 Logical CPU cores.
Here is a part of
my.cnf
:# * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! default_storage_engine = InnoDB # you can't just change log file size, requires special procedure innodb_buffer_pool_size = 9G innodb_log_buffer_size = 8M innodb_file_per_table = 1 innodb_open_files = 400 innodb_io_capacity = 400 innodb_flush_method = O_DIRECT innodb_thread_concurrency = 0 innodb_read_io_threads = 64 innodb_write_io_threads = 64 # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. #bind-address = 127.0.0.1 # # * Fine Tuning # max_connections = 500 connect_timeout = 5 wait_timeout = 600 max_allowed_packet = 16M thread_cache_size = 128 sort_buffer_size = 4M bulk_insert_buffer_size = 16M tmp_table_size = 32M max_heap_table_size = 32M
-
Hamed Kamrava about 7 yearsdear billynoah.(sorry for my bad English). This query sometimes run in ~1 seconds but sometimes run in ~30 seconds! I really don't know why! I think when query run for first time, it takes too long time but after that it run fast(probably because of query cache or something like that).
-
Eaten by a Grue about 7 yearstry it with
SELECT SQL_NO_CACHE
... to skip the cache and get a better idea how it's performing on it's own. Did you try adding that index? -
Hamed Kamrava about 7 yearsIt's my production server. Adding index takes long time. I have to announce users
-
Eaten by a Grue about 7 yearscopy the table somewhere and try it there.
-
Wajih about 7 yearsMaybe the problem because sometimes your query need to wait some other SQL that lock the table.
-
Hamed Kamrava about 7 years@wajeeh How can I prove that?
-
Eaten by a Grue about 7 yearsIs it a MyISAM table? Are there frequent write operations happening? If the query runs fast without a cache then it very well could be the case. If it's INNODB, then I'd think row level locking shouldn't interfere
-
Hamed Kamrava about 7 yearsIt's InnoDB. Yes table has frequent write operations. at least 200 new rows in a minute.
-
Wajih about 7 yearsThe best practice is to copy the table to a separate place and try @HamedKamrava
-
Thorsten Kettner about 7 yearsOne more option is to add
amount
at the end of the suggested index. Then you have a covering index containing all needed columns and the table doesn't have to be accessed anymore; all data can be read from the index. -
Hamed Kamrava about 7 years@ThorstenKettner I don't get that! Can you give me a reference?
-
Thorsten Kettner about 7 yearsWith
create index idx on transactions(user_id, confirmed)
you help the DBMS find the records that match user_id and confirmed; the index tells the DBMS where the records are located in the table and then it can read them to see the values. Withcreate index idx on transactions(user_id, confirmed, value)
on the other hand, the DBMS gets to the matching index entries quickly again, but doesn't have to use it to access the table anymore, because the value is already stored in the index itself. This is usually much faster. -
Hamed Kamrava about 7 years@ThorstenKettner Thank you. Already there is an index on
user_id
. So I need to delete this index and create two more indexes, firstly(user_id, confirmed)
and second(user_id, confirmed, value)
. Am I right? -
Thorsten Kettner about 7 yearsNo. You don't have to delete any index. An index is just an offer to the DBMS. For another query the DBMS might decide to use the index on
user_id
alone, because the index is smaller and can be read faster. But yes, you can drop them, and I'd recommend to drop the one on(user_id, confirmed)
, because it really seems superfluous to have both. -
Hamed Kamrava about 7 yearsWow great explanations. I have two big tables.
transactions (Data_length = 973996032)
,short_messages (Data_length = 3531227136)
. -
Hamed Kamrava about 7 yearsMySQL(MariaDB) version is
10.1.18-MariaDB-1
-
Rick James about 7 yearsYou should specify
ALGORITHM=INPLACE
. That way, if "the operation necessarily requires a copy, an error is issued". So, I think (without having tested a situation like yours) you can safely add the index without stopping production. (It would be wise to test somewhere first.) (The system may slow down, since it is a moderately intensive operation.) -
Hamed Kamrava about 7 years"Are there 6M rows in the table? But 773K rows with that user_id?" Yes most of our data are belongs to three users!
-
Hamed Kamrava about 7 yearsYou are my hero