How can I make SQL case sensitive string comparison on MySQL?
Solution 1
http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html
The default character set and collation are latin1 and latin1_swedish_ci, so nonbinary string comparisons are case insensitive by default. This means that if you search with col_name LIKE 'a%', you get all column values that start with A or a. To make this search case sensitive, make sure that one of the operands has a case sensitive or binary collation. For example, if you are comparing a column and a string that both have the latin1 character set, you can use the COLLATE operator to cause either operand to have the latin1_general_cs or latin1_bin collation:
col_name COLLATE latin1_general_cs LIKE 'a%'
col_name LIKE 'a%' COLLATE latin1_general_cs
col_name COLLATE latin1_bin LIKE 'a%'
col_name LIKE 'a%' COLLATE latin1_bin
If you want a column always to be treated in case-sensitive fashion, declare it with a case sensitive or binary collation.
Solution 2
The good news is that if you need to make a case-sensitive query, it is very easy to do:
SELECT * FROM `table` WHERE BINARY `column` = 'value'
Solution 3
The answer posted by Craig White has a big performance penalty
SELECT * FROM `table` WHERE BINARY `column` = 'value'
because it doesn't use indexes. So, either you need to change the table collation like mention here https://dev.mysql.com/doc/refman/5.7/en/case-sensitivity.html.
OR
Easiest fix, you should use a BINARY of value.
SELECT * FROM `table` WHERE `column` = BINARY 'value'
E.g.
mysql> EXPLAIN SELECT * FROM temp1 WHERE BINARY col1 = "ABC" AND col2 = "DEF" ;
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | temp1 | ALL | NULL | NULL | NULL | NULL | 190543 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
VS
mysql> EXPLAIN SELECT * FROM temp1 WHERE col1 = BINARY "ABC" AND col2 = "DEF" ;
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
| 1 | SIMPLE | temp1 | range | col1_2e9e898e | col1_2e9e898e | 93 | NULL | 2 | Using index condition; Using where |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
enter code here
1 row in set (0.00 sec)
Solution 4
Instead of using the = operator, you may want to use LIKE or LIKE BINARY
// this returns 1 (true)
select 'A' like 'a'
// this returns 0 (false)
select 'A' like binary 'a'
select * from user where username like binary 'a'
It will take 'a' and not 'A' in its condition
Solution 5
The most correct way to perform a case sensitive string comparison without changing the collation of the column being queried is to explicitly specify a character set and collation for the value that the column is being compared to.
select * from `table` where `column` = convert('value' using utf8mb4) collate utf8mb4_bin;
Why not use binary
?
Using the binary
operator is inadvisable because it compares the actual bytes of the encoded strings. If you compare the actual bytes of two strings encoded using the different character sets two strings that should be considered the same they may not be equal. For example if you have a column that uses the latin1
character set, and your server/session character set is utf8mb4
, then when you compare the column with a string containing an accent such as 'café' it will not match rows containing that same string! This is because in latin1
é is encoded as the byte 0xE9
but in utf8
it is two bytes: 0xC3A9
.
Why use convert
as well as collate
?
Collations must match the character set. So if your server or session is set to use the latin1
character set you must use collate latin1_bin
but if your character set is utf8mb4
you must use collate utf8mb4_bin
. Therefore the most robust solution is to always convert the value into the most flexible character set, and use the binary collation for that character set.
Why apply the convert
and collate
to the value and not the column?
When you apply any transforming function to a column before making a comparison it prevents the query engine from using an index if one exists for the column, which could dramatically slow down your query. Therefore it is always better to transform the value instead where possible. When a comparison is performed between two string values and one of them has an explicitly specified collation, the query engine will use the explicit collation, regardless of which value it is applied to.
Accent Sensitivity
It is important to note that MySql is not only case insensitive for columns using an _ci
collation (which is typically the default), but also accent insensitive. This means that 'é' = 'e'
. Using a binary collation (or the binary
operator) will make string comparisons accent sensitive as well as case sensitive.
What is utf8mb4
?
The utf8
character set in MySql is an alias for utf8mb3
which has been deprecated in recent versions because it does not support 4 byte characters (which is important for encoding strings like 🐈). If you wish to use the UTF8 character encoding with MySql then you should be using the utf8mb4
charset.
StevenB
Updated on July 08, 2022Comments
-
StevenB almost 2 years
I have a function that returns five characters with mixed case. If I do a query on this string it will return the value regardless of case.
How can I make MySQL string queries case sensitive?
-
StevenB about 13 yearsany hint on how to do this in phpmyadmin?
-
drudge about 13 years@StevenB: Click the column's Edit button, then set the Collation --> i.imgur.com/7SoEw.png
-
adjwilli over 11 yearsThis is exactly what I was looking for. I would it up higher if I could. A question though, what effect does this have on performance? I'm using it on a limited reporting thing, so it's not important in my case, but I am curious.
-
piotrekkr about 11 years@BT To make utf8 column case sensitive you could use bin colation like:
SELECT 'email' COLLATE utf8_bin = 'Email'
-
Art Geigel almost 11 yearsWhy is this not the answer? This is exactly what I needed too.
-
dshin over 10 years@adjwilli If the column was a part of an index, you will suffer a performance hit on queries reliant on that index. To maintain performance, you need to actually alter the table.
-
adjwilli over 10 years@David My table needs to be in UTF8 and this was part of an export script so I think the trade off in performance is worth it. But good to know not to use that in frequently run queries.
-
TMH almost 10 yearsI'd be interested to see the performance difference between this method and the
latin1_bin
method above. I might have to run some benchmarks when I get chances and post my results. -
Stephane over 9 years@drudge How would you declare a column with a case sensitive collation ?
-
JScarry over 9 yearsHere’s an example that works for me: SELECT name FROM
ASA
WHERE CAST(name AS BINARY) LIKE '%Net%' -
Andrew T over 9 years@StephaneEybert if you're looking for straight up case sensitivity I have had luck in using varbinary instead of varchar for a field in ut8 table. HTH
-
mvds almost 9 yearsWhat will this do for UTF-8 strings containing the same character with a different representation, e.g. using a combining character to add an umlaut? These UTF-8 strings could be treated as equal:
convert(char(0x65,0xcc,0x88) using utf8)
(i.e.e
with¨
added) andconvert(char(0xc3,0xab) using utf8)
(i.e.ë
), but addingBINARY
will make them unequal. -
mvds almost 9 yearsAlso, comparing floating point types gives strange results, where a BINARY comparison will say values are unequal although they seem equal to the naked eye.
-
Jean Vincent over 8 yearsTested, this does not work for non-ASCII characters, but it works with the COLLATE latin1_bin solution.
-
golimar almost 8 years@JeanVincent , can you post an example? For me this works
SELECT BINARY 'Ñ'='Ñ', BINARY 'Ñ'='ñ'
(returns 1 and 0) -
Putnik over 7 yearsCould you please show how should it look in 'insert' statement?
-
Sylvain B about 7 yearsCould the OP finally come here and accept this answer? We don't come on SO to find copy/pastes of incomprehensive documentation.
-
Danny over 6 yearsIt's worth commenting to say that the above will only help depending on your data - your case insensitive search could potentially return a rather large subset of data.
-
Robert Sinclair over 6 yearsas Jean Vincent stated, this doesn't work with NON-ASCII chars, if you have an entry like "à vendre" it will not find it when you search for BINAY col="à vendre"
-
Robbie about 6 yearsThis also doesn't work if doing REPLACE INTO based on a unique indexes or similar operations. (As mentioned above it will also kill a server if you have a lot of records.) This answer is appropriate in some circumstances but not all.
-
Theo almost 6 yearsNeed to add
declare pSuccess BINARY;
at start -
Lluís Suñol about 5 yearsAs a performance example: my query passes from 3,5ms (negligible) to 1.570ms (this is about a second and a half), querying a table with 1.8M rows aprox.
-
jw_ over 4 yearsI change the column Collation to latin1_bin but it doesn't work, why? Only put the collation in the query works.
-
jw_ over 4 years@jw_(myself) The reason is MySQL Workbench 6.3 seems to have a bug, when you set the column to binary or set the collumn collation to latin1_bin and Apply, nothing is changed. A solution is rename the column and change the setting at the same time then rename it back.
-
mikep over 4 yearsI recommend @Nitesh answer because of performance advantage (use BINARY keyword before value not before column to enable indexes)...
SELECT * FROM `table` WHERE `column` = BINARY 'value'
-
user10398534 about 4 yearsThis does not seem to be case-sensitive on 10.3.22-MariaDB (using libmysql - 5.6.43)
-
user10398534 about 4 yearsThis does not seem to be case-sensitive on 10.3.22-MariaDB (using libmysql - 5.6.43)
-
user10398534 about 4 yearsThis does not seem to be case-sensitive on 10.3.22-MariaDB (using libmysql - 5.6.43)
-
user10398534 about 4 yearsThis does not seem to be case-sensitive on 10.3.22-MariaDB (using libmysql - 5.6.43)
-
theking2 about 4 yearsThere are very little non binary case sensitive collations in MYSQL. The relatively new utf8mb4 character set seems the mose complete. The ones you are looking for all end in _cs. There are a view in the latin7 charset. I was suprised as comming from a SQLServer environement I was used to have just the same collation in _ci as well as _cs
-
michaelf over 3 yearsI used Craig White's solution for year but after a few page load complaints I took a deeper look, made the change Nitesh recommended and query went from 2.5 seconds to 0.15 seconds. It was not using the index when Binary was before Where. After moving Binary to after Where the index was used. Thank you!
-
FanoFN over 3 yearsIt does seem to be case-sensitive on 10.3.23-MariaDB though
-
Eaten by a Grue over 3 yearsExcellent idea Nitesh! This should be the top voted answer