String compare exact in query MySQL

31,334

Solution 1

There is no other fix for it. Either you specify a single comparison as being binary or you set the whole database connection to binary. (doing SET NAMES binary, which may have other side effects!)

Basically, that 'lazy' comparison is a feature of MySQL which is hard coded. To disable it (on demand!), you can use a binary compare, what you apparently already do. This is not a 'workaround' but the real fix.

from the MySQL Manual:

All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces

Of course there are plenty of other possiblities to achieve the same result from a user's perspective, i.e.:

  • WHERE field = 'abc ' AND CHAR_LENGTH(field) = CHAR_LENGTH('abc ')
  • WHERE field REGEXP 'abc[[:space:]]'

The problem with these is that they effectively disable fast index lookups, so your query always results in a full table scan. With huge datasets that makes a big difference.

Again: PADSPACE is default for MySQLs [VAR]CHAR comparison. You can (and should) disable it by using BINARY. This is the indended way of doing this.

Solution 2

You can try with a regular expression matching :

SELECT * FROM barcode WHERE `code` REGEXP 'abc[[:space:]]'

Solution 3

i was just working on case just like that when using LIKE with wildcard (%) resulting in an unexpected result. While searching i also found STRCMP(text1, text2) under string comparison feature of mysql which compares two string. however using BINARY with LIKE solved the problem for me.

SELECT * FROM barcode WHERE `code` LIKE BINARY 'abc ';
Share:
31,334
Tang Khai Phuong
Author by

Tang Khai Phuong

Code for food

Updated on August 11, 2022

Comments

  • Tang Khai Phuong
    Tang Khai Phuong over 1 year

    I created table like that in MySQL:

    DROP TABLE IF EXISTS `barcode`;
    CREATE TABLE `barcode` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `code` varchar(40) COLLATE utf8_bin DEFAULT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
    
    
    INSERT INTO `barcode` VALUES ('1', 'abc');
    
    INSERT INTO `barcode` VALUES ('2', 'abc ');
    

    Then I query data from table barcode:

    SELECT * FROM barcode WHERE `code` = 'abc ';
    

    The result is:

    +-----+-------+
    |  id | code  |
    +-----+-------+
    |  1  |  abc  |
    +-----+-------+
    |  2  |  abc  |
    +-----+-------+
    

    But I want the result set is only 1 record. I workaround with:

    SELECT * FROM barcode WHERE `code` = binary 'abc ';
    

    The result is 1 record. But I'm using NHibernate with MySQL for generating query from mapping table. So that how to resolve this case?

  • Romain
    Romain about 12 years
    This is not answering the OP - the problem seems to be that they want to have only exact string matches, including leading and trailing spaces. Using LIMIT does not achieve this.
  • eggyal
    eggyal about 12 years
    I don't think your proposed use of COLLATE utf8_general_ci will make any difference?
  • eggyal
    eggyal about 12 years
    Downvoter care to help the rest of us understand what you think is wrong with this answer?
  • eggyal
    eggyal about 12 years
    Downvoter care to help the rest of us understand what you think is wrong with this answer?
  • Kaii
    Kaii about 11 years
    nice catch. but when using LIKE one should also be aware of the other side effects of the LIKE operator. Examples: "A" LIKE "a" equals true, "a" LIKE "_" also equals true. As a workaround to the case-insensitivity of LIKE the manual offers a LIKE BINARY operation .. And that is where the circle is closed: BINARY is the real fix. If you don't believe me, see the manual for LIKE. The only difference between = BINARY and LIKE BINARY is that LIKE has more side effects. Sorry ;)
  • LeGEC
    LeGEC about 11 years
    "A" LIKE "a" depends on the collation, doesn't it ? just like =. I wasn't aware of the "_" special character, though. It sure does complicate the corner cases. Thanks for pointing that out.
  • Kaii
    Kaii about 11 years
    You're right, "A" LIKE "a" depends on collation. "A" COLLATE utf8_bin LIKE "a" COLLATE utf8_bin does return 0.
  • Oliver Matthews
    Oliver Matthews almost 10 years
    FYI: While your answer appears plausible, it has appeared in the 'low quality posts' queue after being flagged for deletion. I suspect this is because it is a code only answer without explanation. To be clear here, I didn't flag it so I don't know why it was flagged. However I do see a lot of code only answers appearing in that queue, so I have taken to adding comments to them so the responder is aware of this.
  • R T
    R T almost 10 years
    my bad. i should add some explanation. thanks for letting me know.
  • Kaii
    Kaii over 8 years
    I guess this answer was downvoted because WHERE F(x) = F(y) is causing a lot of computational overhead, because the function must be executed two times for each row in the result dataset. In this case it's not too bad, because there is at least one simple WHERE condition. If F(x) = F(y) was the only condition, it would result in a full table scan, which is something you should avoid.
  • Kaii
    Kaii over 8 years
    I guess this answer was downvoted because REGEXP disables index lookups and always results in a full table scan. If you can avoid regex in mysql, you shouldn't use it.