Why does crypt/blowfish generate the same hash with two different salts?

14,185

Solution 1

After some experimentation, I have come to the conclusion that this is due to the way the salt is treated. The salt is not considered to be literal text, but rather to be a base64 encoded string, such that 22 bytes of salt data would actually represent a 16 byte string (floor(22 * 24 / 32) == 16) of salt. The "Gotcha!" with this implementation, though, is that, like Unix crypt, it uses a "non-standard" base64 alphabet. To be exact, it uses this alphabet:

./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789$

The 65th character, '$', is the padding character.

Now, the crypt() function appears to be capable of taking a salt of any length less than or equal to its maximum, and silently handling any inconsistencies in the base64 by discarding any data that doesn't make up another full byte. The crypt function will fail completely if you pass it characters in the salt that are not part of its base64 alphabet, which just confirms this theory of its operation.

Take an imaginary salt '1234'. This is perfectly base64 consistent in that it represents 24 bits of data, so 3 bytes, and does not carry any data that needs to be discarded. This is a salt whose Len Mod 4 is zero. Append any character to that salt, and it becomes a 5 character salt, and Len Mod 4 is now 1. However, this additional character represents only six bits of data, and therefore cannot be transformed into another full byte, so it is discarded.

Thus, for any two salts A and B, where

   Len A Mod 4 == 0 
&& Len B Mod 4 == 1  // these two lines mean the same thing
&& Len B = Len A + 1 // but are semantically important separately
&& A == substr B, 0, Len A

The actual salt used by crypt() to calculate the hash will, in fact, be identical. As proof, I'm including some example PHP code that can be used to show this. The salt constantly rotates in a seminon-random way (based on a randomish segment of the whirlpool hash of the current time to the microsecond), and the data to be hashed (herein called $seed) is simply the current Unix-Epoch time.

$salt = substr(hash('whirlpool',microtime()),rand(0,105),22);
$seed = time();
for ($i = 0, $j = strlen($salt); $i <= $j; ++$i) {
    printf('%02d = %s%s%c',
        $i,
        crypt($seed,'$2a$07$' . substr($salt, 0, $i)),
        $i%4 == 0 || $i % 4 == 1 ? ' <-' : '',
        0x0A
    );
}

And this produces output similar to the following

00 = $2a$07$$$$$$$$$$$$$$$$$$$$$$.rBxL4x0LvuUp8rhGfnEKSOevBKB5V2. <-
01 = $2a$07$e$$$$$$$$$$$$$$$$$$$$.rBxL4x0LvuUp8rhGfnEKSOevBKB5V2. <-
02 = $2a$07$e8$$$$$$$$$$$$$$$$$$$.WEimjvvOvQ.lGh/V6HFkts7Rq5rpXZG
03 = $2a$07$e89$$$$$$$$$$$$$$$$$$.Ww5p352lsfQCWarRIWWGGbKa074K4/.
04 = $2a$07$e895$$$$$$$$$$$$$$$$$.ZGSPawtL.pOeNI74nhhnHowYrJBrLuW <-
05 = $2a$07$e8955$$$$$$$$$$$$$$$$.ZGSPawtL.pOeNI74nhhnHowYrJBrLuW <-
06 = $2a$07$e8955b$$$$$$$$$$$$$$$.2UumGVfyc4SgAZBs5P6IKlUYma7sxqa
07 = $2a$07$e8955be$$$$$$$$$$$$$$.gb6deOAckxHP/WIZOGPZ6/P3oUSQkPm
08 = $2a$07$e8955be6$$$$$$$$$$$$$.5gox0YOqQMfF6FBU9weAz5RmcIKZoki <-
09 = $2a$07$e8955be61$$$$$$$$$$$$.5gox0YOqQMfF6FBU9weAz5RmcIKZoki <-
10 = $2a$07$e8955be616$$$$$$$$$$$.hWHhdkS9Z3m7/PMKn1Ko7Qf2S7H4ttK
11 = $2a$07$e8955be6162$$$$$$$$$$.meHPOa25CYG2G8JrbC8dPQuWf9yw0Iy
12 = $2a$07$e8955be61624$$$$$$$$$.vcp/UGtAwLJWvtKTndM7w1/30NuYdYa <-
13 = $2a$07$e8955be616246$$$$$$$$.vcp/UGtAwLJWvtKTndM7w1/30NuYdYa <-
14 = $2a$07$e8955be6162468$$$$$$$.OTzcPMwrtXxx6YHKtaX0mypWvqJK5Ye
15 = $2a$07$e8955be6162468d$$$$$$.pDcOFp68WnHqU8tZJxuf2V0nqUqwc0W
16 = $2a$07$e8955be6162468de$$$$$.YDv5tkOeXkOECJmjl1R8zXVRMlU0rJi <-
17 = $2a$07$e8955be6162468deb$$$$.YDv5tkOeXkOECJmjl1R8zXVRMlU0rJi <-
18 = $2a$07$e8955be6162468deb0$$$.aNZIHogUlCn8H7W3naR50pzEsQgnakq
19 = $2a$07$e8955be6162468deb0d$$.ytfAwRL.czZr/K3hGPmbgJlheoZUyL2
20 = $2a$07$e8955be6162468deb0da$.0xhS8VgxJOn4skeI02VNI6jI6324EPe <-
21 = $2a$07$e8955be6162468deb0da3.0xhS8VgxJOn4skeI02VNI6jI6324EPe <-
22 = $2a$07$e8955be6162468deb0da3ucYVpET7X/5YddEeJxVqqUIxs3COrdym

The conclusion? Twofold. First, it's working as intended, and second, know your own salt or don't roll your own salt.

Solution 2

Great answer, and clear explanation. But it seems to me there is either a bug in the implementation or some further explanation of the intent is needed {the comments to the post explain why there is not a bug}. The current php documentation states:

CRYPT_BLOWFISH - Blowfish hashing with a salt as follows: "$2a$", a two digit cost parameter, "$", and 22 base 64 digits from the alphabet "./0-9A-Za-z". Using characters outside of this range in the salt will cause crypt() to return a zero-length string. The two digit cost parameter is the base-2 logarithm of the iteration count for the underlying Blowfish-based hashing algorithmeter and must be in range 04-31, values outside this range will cause crypt() to fail.

This is consistent with what's been stated and demonstrated here. Unfortunately the documentation doesn't describe the return value very usefully:

Returns the hashed string or a string that is shorter than 13 characters and is guaranteed to differ from the salt on failure.

But as shown in the reply by Dereleased, if the input salt string is valid, the output consists of the input salt padded out to a fixed length with '$' characters, with the 32-character computed hash value appended to it. Unfortunately, the salt in the result is padded out to only 21 base64 digits, not 22! This is shown by the last three lines in that reply, where we see one '$' for 20 digits, no '$' for 21, and when there are 22 base64 digits in the salt, the first character of the hash result replaces the 22nd digit of the input salt. The function is still usable, because the complete value it computes is available to the caller as substr(crypt($pw,$salt), 28, 32), and the caller already knows the complete salt value because it passed that string as an argument. But it's very difficult to understand why the return value is designed so that it can only give you 126 bits of the 128-bit salt value. In fact, it's hard to understand why it includes the input salt at all; but omitting 2 bits of it is really unfathomable.

Here's a little snippet showing that the 22nd base64 digit contributes just two more bits to the salt actually used in the computation (there are only 4 distinct hashes produced):

$alphabet = './ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
$lim = strlen($alphabet);
$saltprefix = '$2a$04$123456789012345678901'; // 21 base64 digits


for ($i = 0; $i < $lim; ++$i ) {
  if ($i = 16 || $i == 32 || $i == 48) echo "\n";
  $salt = $saltprefix . substr($alphabet, $i, 1);
  $crypt = crypt($password, $salt);
  echo "salt ='$salt'\ncrypt='$crypt'\n";
}

salt ='$2a$04$123456789012345678901.'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901/'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901A'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901B'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901C'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901D'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901E'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901F'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901G'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901H'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901I'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901J'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901K'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901L'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901M'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901N'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'

salt ='$2a$04$123456789012345678901O'
crypt='$2a$04$123456789012345678901Ots44xXtSV0f6zMrHerQ2IANdsJ.2ioG'
salty='$2a$04$123456789012345678901P'
crypt='$2a$04$123456789012345678901Ots44xXtSV0f6zMrHerQ2IANdsJ.2ioG'
salty='$2a$04$123456789012345678901Q'
crypt='$2a$04$123456789012345678901Ots44xXtSV0f6zMrHerQ2IANdsJ.2ioG'
  ... 13 more pairs of output lines with same hash

salt ='$2a$04$123456789012345678901e'
crypt='$2a$04$123456789012345678901e.1cixwQ2qnBqwFeEcMfNfXApRK0ktqm'
  ... 15 more pairs of output lines with same hash

salt ='$2a$04$123456789012345678901u'
crypt='$2a$04$123456789012345678901u5yLyHIE2JetWU67zG7qvtusQ2KIZhAa'
  ... 15 more pairs of output lines with same hash

The grouping of the identical hash values also shows that the mapping of the alphabet actually used is most likely as written here, rather then in the order shown in the other reply.

Perhaps the interface was designed this way for some kind of compatibility, and perhaps because it has already shipped this way it can't be changed. {the first comment to the post explains why the interface is this way}. But certainly the documentation ought to explain what's going on. Just in case the bug might get fixed some day, perhaps it would be safest to obtain the hash value with:

substr(crypt($pw,$salt), -32)

As a final note, while the explanation of why the hash value repeats when the number of base64 digits specified mod 4 == 1 makes sense in terms of why code might behave that way, it doesn't explain why writing the code that way was a good idea. The code could and arguably should include the bits from a base64 digit that makes up a partial byte when computing the hash, instead of just discarding them. If the code had been written that way, then it seems likely the problem with losing the 22nd digit of the salt in the output would not have appeared, either. {As the comments to the post explain, even though the 22nd digit is overwritten, the digit of the hash that overwrites it will be only one of the four possible values [.Oeu], and these are the only significant values for the 22nd digit. If the 22nd digit is not one of those four values, it will be replaced by the one of those four that produces the same hash.}

In light of the comments, it seems clear there is no bug, just incredibly taciturn documentation :-) Since I'm not a cryptographer, I can't say this with any authority, but it seems to me that it's a weakness of the algorithm that a 21-digit salt apparently can produce all possible hash values, while a 22-digit salt limits the first digit of the hash to only one of four values.

Solution 3

Looks like the outputs are actually different. (da$, vs da2) for result of salt_20 and salt_21.

Share:
14,185
Dereleased
Author by

Dereleased

I wrote a code once. It was awful. If you're wondering whether the preceding applies to my experience doing it, or its quality once done: yes, it does.

Updated on July 30, 2022

Comments

  • Dereleased
    Dereleased almost 2 years

    This question has to do with PHP's implementation of crypt(). For this question, the first 7 characters of the salt are not counted, so a salt '$2a$07$a' would be said to have a length of 1, as it is only 1 character of salt and seven characters of meta-data.

    When using salt strings longer than 22 characters, there is no change in the hash generated (i.e., truncation), and when using strings shorter than 21 characters the salt will automatically be padded (with '$' characters, apparently); this is fairly straightforward. However, if given a salt 20 characters and a salt 21 characters, where the two are identical except for the final character of the 21-length salt, both hashed strings will be identical. A salt 22 characters long, which is identical to the 21 length salt except for the final character, the hash will be different again.

    Example In Code:

    $foo = 'bar';
    $salt_xx = '$2a$07$';
    $salt_19 = $salt_xx . 'b1b2ee48991281a439d';
    $salt_20 = $salt_19 . 'a';
    $salt_21 = $salt_20 . '2';
    $salt_22 = $salt_21 . 'b';
    
    var_dump(
        crypt($foo, $salt_19), 
        crypt($foo, $salt_20), 
        crypt($foo, $salt_21), 
        crypt($foo, $salt_22)
    );
    

    Will produce:

    string(60) "$2a$07$b1b2ee48991281a439d$$.dEUdhUoQXVqUieLTCp0cFVolhFcbuNi"
    string(60) "$2a$07$b1b2ee48991281a439da$.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
    string(60) "$2a$07$b1b2ee48991281a439da2.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
    string(60) "$2a$07$b1b2ee48991281a439da2O4AH0.y/AsOuzMpI.f4sBs8E2hQjPUQq"
    

    Why is this?

    EDIT:

    Some users are noting that there is a difference in the overall string, which is true. In salt_20, offset (28, 4) is da$., while in salt_21, offset (28, 4) is da2.; however, it is important to note that the string generated includes the hash, the salt, as well as instructions to generate the salt (i.e. $2a$07$); the part in which the difference occurs is, in fact, still the salt. The actual hash is unchanged as UxGYN739wLkV5PGoR1XA4EvNVPjwylG.

    Thus, this is in fact not a difference in the hash produced, but a difference in the salt used to store the hash, which is precisely the problem at hand: two salts are generating the same hash.

    Rembmer: the output will be in the following format:

    "$2a$##$saltsaltsaltsaltsaltsaHASHhashHASHhashHASHhashHASHhash"
    //                            ^ Hash Starts Here, offset 28,32
    

    where ## is the log-base-2 determining the number of iterations the algorithm runs for

    Edit 2:

    In the comments, it was requested that I post some additional info, as the user could not reproduce my output. Execution of the following code:

    var_dump(
        PHP_VERSION, 
        PHP_OS, 
        CRYPT_SALT_LENGTH, 
        CRYPT_STD_DES, 
        CRYPT_EXT_DES, 
        CRYPT_MD5, 
        CRYPT_BLOWFISH
    );
    

    Produces the following output:

    string(5) "5.3.0"
    string(5) "WINNT"
    int(60)
    int(1)
    int(1)
    int(1)
    int(1)
    

    Hope this helps.

  • Dereleased
    Dereleased over 14 years
    Actually, that part is actually still the salt of the hash, not the hash itself. The hash is only the last 32 characters. I have edited the question to reflect the fact that the salt is prepended to the string in order to make the question more clear.
  • H. Green
    H. Green over 14 years
    Maybe you can post some information about your php version, host OS, and the values of the following php constants: - CRYPT_SALT_LENGTH - CRYPT_STD_DES - CRYPT_EXT_DES - CRYPT_MD5 - CRYPT_BLOWFISH When I run the code you posted I get the following: string(13) "$2yymq4BO0Z3w" string(13) "$2yymq4BO0Z3w" string(13) "$2yymq4BO0Z3w" string(13) "$2yymq4BO0Z3w"
  • Dereleased
    Dereleased over 14 years
    You are getting encryption using STD_DES - notice how only the first two characters of the given salt are considered. Internal support for Blowfish (if not supported by your host OS) was not added until PHP 5.3, so if you have an older version it would revert to the default method, which on your setup appears to be STD_DES. See php.net/manual/en/function.crypt.php for more info.
  • H. Green
    H. Green over 14 years
    I would venture then that the crypt function has some internal mechanism for manipulating the salt once it is received, either some expansion or contraction function to make it the right size for the given encryption algorithm (perhaps treating it as base64 and when decoding if it does not have the correct number of bytes to read in it pads it regardless of the last several bytes). I've not taken the time to dig through the latest php source code to look at this, but if you are really curious that's where you should start!
  • Dereleased
    Dereleased over 14 years
    Indeed. Might have been more apparent if it used a standard base64 alphabet, but oh well, life goes on.
  • Dereleased
    Dereleased over 13 years
    I completely agree with you about including the partial byte value the last character represents. Furthermore, I thank you for this interesting follow up, basically a horizontal expansion which illustrates another idiosyncrasy in the same vein as my vertical expansion. However, the reason the salt is appended to the string is actually fairly simple: crypt(password, crypt(password, salt)) == crypt(password, salt). Moreover, there is supposed to be a way, for certain mechanisms, to trigger the automatic generation of the salt, so that would be the only way to get it. Stealing your alphabet.
  • sootsnoot
    sootsnoot over 13 years
    Ah, now the format of the return value makes sense, thank you! I was about to say it's too bad the equation is broken if you actually use a 22-digit salt, but I see that's not the case. While the first digit of the computed hash overwrites the 22nd digit of the input salt, what happens is that when the 22nd digit is one of [.Oeu], that digit is also the first digit of the hash! Those base64 digits have values 0, 16, 32, 48, respectively - the high-order two bits. The other four bits are ignored since the binary salt is 128 bits and 22 digits would be 132 bits.
  • sootsnoot
    sootsnoot over 13 years
    So when the 22nd digit is not one of [.Oeu], the hash replaces it by one of those four, and the equation still works! In going over this again I noticed that there are cut-and paste problems in the data of my post, so I'll fix that and also add something about this. Thanks again!