Why is LEFT JOIN slower than INNER JOIN?
Solution 1
With INNER JOIN, MySQL generally will start with the table with the smallest number of rows. In this case, it starts with table finished
and does a look up for the corresponding record in saved
using the index on saved.email
.
For a LEFT JOIN, (excluding some optimizations) MySQL generally joins the records in order (starting with the left most table). In this case, MySQL starts with the table saved
, then attempts to find each corresponding record in finished
. Since there is no usable index on finished.email
, it must do a full scan for each look up.
Edit
Now that you posted your schema, I can see that MySQL is ignoring the index (finished.email
) when going from utf8
to latin1
character set. You've not posted the character sets and collations for each column, so I'm going by the default character set for the table. The collations must be compatible in order for MySQL to use the index.
MySQL can coerce (upgrade) a latin1
collation, which is very limited, up to a utf8
collation such as unicode_ci
(so the first query can use the index on saved.email
by upgrading latin1
collation to utf8
), but the opposite is not true (the second query can't use the index on finished.email
since it can't downgrade a utf8
collation down to latin1
).
The solution is to change both email columns to a compatible collation, perhaps most easily by making them identical character sets and collations.
Solution 2
The LEFT JOIN
query is slower than the INNER JOIN
query because it's doing more work.
From the EXPLAIN output, it looks like MySQL is doing nested loop join. (There's nothing wrong with nested loops; I think that's the only join operation that MySQL uses in version 5.5 and earlier.)
For the INNER JOIN
query, MySQL is using an efficient "ref"
(index lookup) operation to locate the matching rows.
But for the LEFT JOIN
query, it looks like MySQL is doing a full scan of the index to find the matching rows. So, with the nested loops join operation, MySQL is doing a full index scan scan for each row from the other table. So, that's on the order of tens of thousands of scans, and each of those scans is inspecting tens of thousands of rows.
Using the estimated row counts from the EXPLAIN output, that's going to require (40971*32168=) 1,317,955,128 string comparisons.
The INNER JOIN
query avoids a lot of that work, so it's a lot faster. (It's avoiding all those string comparisons by using an index operation.
-- LEFT JOIN
id select table type key key_len ref rows Extra
-- ------ -------- ----- ----- ------- ---- ----- ------------------------
1 SIMPLE saved index email 383 NULL 40971 Using index
1 SIMPLE finished index email 258 NULL 32168 Using index
-- INNER JOIN
id select table type key key_len ref rows Extra
-- ------ -------- ----- ----- ------- ---- ----- ------------------------
1 SIMPLE finished index email 258 NULL 32168 Using index
1 SIMPLE saved ref email 383 func 1 Using where; Using index
^^^^^ ^^^^ ^^^^^ ^^^^^^^^^^^^
NOTE: Markus Adams spied the difference in characterset in the email
columns CREATE TABLE
statements that were added to your question.
I believe that it's the difference in the characterset that's preventing MySQL from using an index for your query.
Q2: How do I make the LEFT JOIN query faster?
A: I don't believe it's going to be possible to get that specific query to run faster, without a schema change, such as changing the characterset of the two email columns to match.
The only affect that the "outer join" to the finished
table looks like it is to produce "duplicate" rows whenever more than one matching row is found. I'm not understanding why the outer join is needed. Why not just get rid of it altogether, and just do:
SELECT saved.email FROM saved
Solution 3
I'm afraid more info will probably be needed.
However, inner joins
eliminate any item that has a null foreign key (no match, if you will). This means that there are less rows to scan to associate.
For a left join
however, any non-match needs to be given a blank row, so all of the rows are scanned regardless -- nothing can be eliminated.
This makes the data set larger and requires more resources to process. Also, when you write your select, don't do select *
-- instead, explicitly state which columns you want.
Solution 4
The data types of saved.email
and finished.email
differ in two respects. First, they have different lengths. Second, finished.email
can be NULL. So, your LEFT JOIN
operation can't exploit the index on finished.email
.
Can you change the definition of finished.email
to this, so it matches the field you're joining it with?
`email` varchar(127) NOT NULL
If you do you'll probably get a speedup.
Comments
-
allenylzhou almost 2 years
I have two queries, the first one (inner join) is super fast, and the second one (left join) is super slow. How do I make the second query fast?
EXPLAIN SELECT saved.email FROM saved INNER JOIN finished ON finished.email = saved.email; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE finished index NULL email 258 NULL 32168 Using index 1 SIMPLE saved ref email email 383 func 1 Using where; Using index EXPLAIN SELECT saved.email FROM saved LEFT JOIN finished ON finished.email = saved.email; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE saved index NULL email 383 NULL 40971 Using index 1 SIMPLE finishedindex NULL email 258 NULL 32168 Using index
Edit: I have added table info for both tables down below.
CREATE TABLE `saved` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `slug` varchar(255) DEFAULT NULL, `email` varchar(127) NOT NULL, [omitted fields include varchar, text, longtext, int], PRIMARY KEY (`id`), KEY `slug` (`slug`), KEY `email` (`email`) ) ENGINE=MyISAM AUTO_INCREMENT=56329 DEFAULT CHARSET=utf8; CREATE TABLE `finished` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `slug` varchar(255) DEFAULT NULL, `submitted` int(11) DEFAULT NULL, `status` int(1) DEFAULT '0', `name` varchar(255) DEFAULT NULL, `email` varchar(255) DEFAULT NULL, [omitted fields include varchar, text, longtext, int], PRIMARY KEY (`id`), KEY `assigned_user_id` (`assigned_user_id`), KEY `event_id` (`event_id`), KEY `slug` (`slug`), KEY `email` (`email`), KEY `city_id` (`city_id`), KEY `status` (`status`), KEY `recommend` (`recommend`), KEY `pending_user_id` (`pending_user_id`), KEY `submitted` (`submitted`) ) ENGINE=MyISAM AUTO_INCREMENT=33063 DEFAULT CHARSET=latin1;
-
Michael Berkowski over 9 yearsPlease post the
SHOW CREATE TABLE
output for both tables. -
radar over 9 yearsthe output is same for both the queries as you are getting the column from saved table alone, what is the main intention?
-
crazy_in_love over 9 yearsI think this belongs to dba.stackexchange.com.
-
zerkms over 9 yearsHow big is
saved
table and what is the number of rows returned by both queries. -
zerkms over 9 years@Marcus Adams: according to the explain output - there is such
-
allenylzhou over 9 yearsAdded SHOW CREATE TABLE output for both tables
-
Deadooshka over 9 yearstry
SELECT STRAIGHT_JOIN ... FROM finished LEFT JOIN saved ...
-
Marcus Adams over 9 yearsSee my updated answer for the actual problem.
-
Marcus Adams over 9 years@zerkms, you're assuming things based on the name of an index.
-
zerkms over 9 years@Marcus Adams: uhm, not sure what you mean. 1. There are schemas available already 2. In the given query only
email
column is used for predicates andSELECT
which automatically means that if any index is used - thenemail
is its left most part. So, excuse me, but my "assumption" is based on facts.
-
-
Mike Brant over 9 yearsThere are no row scans happening here at all. You are correct that more rows are returned in the LEFT JOIN, but the join is able to be calculated using index in either case as shown in the explain.
-
Marcus Adams over 9 years@MikeBrant,
Using index
doesn't mean it used the index to satisfy theWHERE
.Using where; Using index
means it used the index to satisfy theWHERE
. -
allenylzhou over 9 yearsBut there is an index on finished.email.
-
Mike Brant over 9 years@MarcusAdams There is no WHERE clause in the example. I was talking about the indexes being used to satisfy the join condition.
-
zerkms over 9 years@Mike Brant: actually that's not true.
using index
withoutusing where
means the index was read just to fetch the data (to avoid reading the actual data pages). So in the secondEXPLAIN
output it's basically an index full scan, not lookup. Hence it's slow. And there isWHERE
in the first query (theON
node transformed intoWHERE
clause during optimization) -
Mike Brant over 9 years@zerks Yes agreed that the indexes are scanned and read into memory. I was more making note that the answer refers to all rows being scanned. Seemed to be hinting at full table scan, which is not the case here.
-
zerkms over 9 years@Mike Brant: it is still a full index scan, which causes exactly the same effect: cartesian product of 2 sets. So from my point it's fair to say so (but it needs to be carefully explained indeed to avoid further confusion). However I completely agree the explanation of the roots of the issue is kind of vague and in the current form it does not explain anything :-)
-
zerkms over 9 yearsAaaaand we have the winner :-) Nice explanation, +1
-
spencer7593 over 9 yearsGood catch on the difference in the characterset.
-
spencer7593 over 9 yearsI missed the difference in characterset (Markus spied it) that may probably explain why MySQL isn't using a "ref" join operation.
-
allenylzhou over 9 yearsI'm going to accept Adam's answer because the character set is the culprit but thank you for the detailed explanation of how LEFT JOINs differ from INNER JOINs.
-
Darwayne over 7 yearsCharacter set point was a really great one. Made a query of mine go down from 1 hour to 87ms
-
Sammy Larbi over 5 years"The collations must be compatible in order for MySQL to use the index." -- Thanks, I hadn't considered looking at the collation as part of my problem. I converted the tables to use the same collation, and it took my multiple-minute query down to a second or so. Thanks!
-
Christian Saiki over 4 yearsoh my god, you are life saver! Our query went from 8h to 1 minute because we were using different enccoding formats in our tables. Thanks a lot.