Difference between BINARY(16) and CHAR(32) when storing MD5 in database
Currently you are losing half of the checksum when using BINARY(16)
. When you store an MD5 checksum in BINARY(16)
you should store it as binary data, not encoded in hexadecimal. That is:
INSERT INTO test (md5_binary) VALUES(UNHEX(md5("one!")));
You can use the HEX function to encode it into hex again if you want to eye-ball compare it with another checksum:
SELECT HEX(md5_binary) FROM test;
The benefit of using BINARY to store the checksum instead of hexadecimal text is that half the storage is needed.
user1032531
Updated on June 19, 2022Comments
-
user1032531 about 2 years
Based on various recommendations such as What data type to use for hashed password field and what length?, I could store md5 as either CHAR(32) or BINARY(16). But when I do so using BINARY(16), the stored value is identical to the first 16 characters of the CHAR(32) stored results as well as the first 16 characters of the
SELECT MD5()
results. What are the significance of the later 16 characters, and is their lack of presence in the binary column resulting in lost data?CREATE TABLE test (id INT NOT NULL AUTO_INCREMENT, value VARCHAR(6), md5_char CHAR(32) NOT NULL, md5_binary BINARY(16) NOT NULL, PRIMARY KEY (id)) ENGINE = InnoDB; INSERT INTO test(value,md5_char,md5_binary) VALUES("one!",md5("one!"),md5("one!")); INSERT INTO test(value,md5_char,md5_binary) VALUES("two%",md5("two%"),md5("two%")); INSERT INTO test(value,md5_char,md5_binary) VALUES("three~",md5("three~"),md5("three~")); SELECT value,md5(value),md5_char,md5_binary FROM test; DROP TABLE test; +--------+----------------------------------+----------------------------------+------------------+ | value | md5(value) | md5_char | md5_binary | +--------+----------------------------------+----------------------------------+------------------+ | one! | 633c8403325f1cf963809e6eb224d77e | 633c8403325f1cf963809e6eb224d77e | 633c8403325f1cf9 | | two% | 48bbec047b4451a2018e0f652807b7d0 | 48bbec047b4451a2018e0f652807b7d0 | 48bbec047b4451a2 | | three~ | fee453bb4eb68dcdfee07575e75c8cc5 | fee453bb4eb68dcdfee07575e75c8cc5 | fee453bb4eb68dcd | +--------+----------------------------------+----------------------------------+------------------+
-
user1032531 over 10 yearsThanks Joni, Similar, Blowfish-based bcrypt requires CHAR(60) to store it. Could I use BINARY(30) if I use
UNHEX()
? -
Andomar over 10 yearsThe storage benefit is unrelated to the data type. You could store hexadecimal in binary, and un-hexed data in a char column.
-
user1032531 over 10 yearsBut will binary always require exactly half the digits of char?
-
user1032531 over 10 yearsJust realized your post states "half the storage is needed". Thanks
-
Joni over 10 years@Andomar, storing binary data in a CHAR column is not recommended: the database will then interpret the data as having a particular character encoding, and you may corrupt the data unless you know what you are doing (i.e. use a single-byte encoding and a binary collation)
-
Joni over 10 years@user1032531, does bcrypt produce a hex-encoded string? If it does, you can store it in 50% the space by storing it as binary.
-
Brian Leishman about 7 yearsIt'd actually end up being an even bigger space saving than half, would it not? Since the
char
field would most likely use some sort of character encoding also, likeutf8mb4
which would be 4 bytes per character. So you could be using only 12.5% of the space withbinary
overchar
in the best case. -
Joni about 7 yearsYes, it depends on the encoding. MySQL has traditionally defaulted to latin1 which uses one byte per character, but you can always change it.
-
Flying Turtle over 4 yearsI'm slightly confused, isn't hex 4 bits, and here binary(1) is 8 bits, why is it that mysql is representing binary(x) as the number of bytes and not bits?