Difference between BINARY(16) and CHAR(32) when storing MD5 in database

10,588

Currently you are losing half of the checksum when using BINARY(16). When you store an MD5 checksum in BINARY(16) you should store it as binary data, not encoded in hexadecimal. That is:

INSERT INTO test (md5_binary) VALUES(UNHEX(md5("one!")));

You can use the HEX function to encode it into hex again if you want to eye-ball compare it with another checksum:

SELECT HEX(md5_binary) FROM test;

The benefit of using BINARY to store the checksum instead of hexadecimal text is that half the storage is needed.

Share:
10,588
user1032531
Author by

user1032531

Updated on June 19, 2022

Comments

  • user1032531
    user1032531 about 2 years

    Based on various recommendations such as What data type to use for hashed password field and what length?, I could store md5 as either CHAR(32) or BINARY(16). But when I do so using BINARY(16), the stored value is identical to the first 16 characters of the CHAR(32) stored results as well as the first 16 characters of the SELECT MD5() results. What are the significance of the later 16 characters, and is their lack of presence in the binary column resulting in lost data?

    CREATE  TABLE test (id INT NOT NULL AUTO_INCREMENT, value VARCHAR(6), md5_char CHAR(32) NOT NULL, md5_binary BINARY(16) NOT NULL, PRIMARY KEY (id)) ENGINE = InnoDB;
    INSERT INTO test(value,md5_char,md5_binary) VALUES("one!",md5("one!"),md5("one!"));
    INSERT INTO test(value,md5_char,md5_binary) VALUES("two%",md5("two%"),md5("two%"));
    INSERT INTO test(value,md5_char,md5_binary) VALUES("three~",md5("three~"),md5("three~"));
    SELECT value,md5(value),md5_char,md5_binary FROM test;
    DROP TABLE test;
    
    +--------+----------------------------------+----------------------------------+------------------+
    | value  | md5(value)                       | md5_char                         | md5_binary       |
    +--------+----------------------------------+----------------------------------+------------------+
    | one!   | 633c8403325f1cf963809e6eb224d77e | 633c8403325f1cf963809e6eb224d77e | 633c8403325f1cf9 |
    | two%   | 48bbec047b4451a2018e0f652807b7d0 | 48bbec047b4451a2018e0f652807b7d0 | 48bbec047b4451a2 |
    | three~ | fee453bb4eb68dcdfee07575e75c8cc5 | fee453bb4eb68dcdfee07575e75c8cc5 | fee453bb4eb68dcd |
    +--------+----------------------------------+----------------------------------+------------------+
    
  • user1032531
    user1032531 over 10 years
    Thanks Joni, Similar, Blowfish-based bcrypt requires CHAR(60) to store it. Could I use BINARY(30) if I use UNHEX()?
  • Andomar
    Andomar over 10 years
    The storage benefit is unrelated to the data type. You could store hexadecimal in binary, and un-hexed data in a char column.
  • user1032531
    user1032531 over 10 years
    But will binary always require exactly half the digits of char?
  • user1032531
    user1032531 over 10 years
    Just realized your post states "half the storage is needed". Thanks
  • Joni
    Joni over 10 years
    @Andomar, storing binary data in a CHAR column is not recommended: the database will then interpret the data as having a particular character encoding, and you may corrupt the data unless you know what you are doing (i.e. use a single-byte encoding and a binary collation)
  • Joni
    Joni over 10 years
    @user1032531, does bcrypt produce a hex-encoded string? If it does, you can store it in 50% the space by storing it as binary.
  • Brian Leishman
    Brian Leishman about 7 years
    It'd actually end up being an even bigger space saving than half, would it not? Since the char field would most likely use some sort of character encoding also, like utf8mb4 which would be 4 bytes per character. So you could be using only 12.5% of the space with binary over char in the best case.
  • Joni
    Joni about 7 years
    Yes, it depends on the encoding. MySQL has traditionally defaulted to latin1 which uses one byte per character, but you can always change it.
  • Flying Turtle
    Flying Turtle over 4 years
    I'm slightly confused, isn't hex 4 bits, and here binary(1) is 8 bits, why is it that mysql is representing binary(x) as the number of bytes and not bits?