Is "double hashing" a password less secure than just hashing it once?

37

Solution 1

Hashing a password once is insecure

No, multiple hashes are not less secure; they are an essential part of secure password use.

Iterating the hash increases the time it takes for an attacker to try each password in their list of candidates. You can easily increase the time it takes to attack a password from hours to years.

Simple iteration is not enough

Merely chaining hash output to input isn't sufficient for security. The iteration should take place in the context of an algorithm that preserves the entropy of the password. Luckily, there are several published algorithms that have had enough scrutiny to give confidence in their design.

A good key derivation algorithm like PBKDF2 injects the password into each round of hashing, mitigating concerns about collisions in hash output. PBKDF2 can be used for password authentication as-is. Bcrypt follows the key derivation with an encryption step; that way, if a fast way to reverse the key derivation is discovered, an attacker still has to complete a known-plaintext attack.

How to break a password

Stored passwords need protection from an offline attack. If passwords aren't salted, they can be broken with a pre-computed dictionary attack (for example, using a Rainbow Table). Otherwise, the attacker must spend time to compute a hash for each password and see if it matches the stored hash.

All passwords are not equally likely. Attackers might exhaustively search all short passwords, but they know that their chances for brute-force success drop sharply with each additional character. Instead, they use an ordered list of the most likely passwords. They start with "password123" and progress to less frequently used passwords.

Let's say an attackers list is long, with 10 billion candidates; suppose also that a desktop system can compute 1 million hashes per second. The attacker can test her whole list is less than three hours if only one iteration is used. But if just 2000 iterations are used, that time extends to almost 8 months. To defeat a more sophisticated attacker—one capable of downloading a program that can tap the power of their GPU, for example—you need more iterations.

How much is enough?

The number of iterations to use is a trade-off between security and user experience. Specialized hardware that can be used by attackers is cheap, but it can still perform hundreds of millions of iterations per second. The performance of the attacker's system determines how long it takes to break a password given a number of iterations. But your application is not likely to use this specialized hardware. How many iterations you can perform without aggravating users depends on your system.

You can probably let users wait an extra ¾ second or so during authentication. Profile your target platform, and use as many iterations as you can afford. Platforms I've tested (one user on a mobile device, or many users on a server platform) can comfortably support PBKDF2 with between 60,000 and 120,000 iterations, or bcrypt with cost factor of 12 or 13.

More background

Read PKCS #5 for authoritative information on the role of salt and iterations in hashing. Even though PBKDF2 was meant for generating encryption keys from passwords, it works well as a one-way-hash for password authentication. Each iteration of bcrypt is more expensive than a SHA-2 hash, so you can use fewer iterations, but the idea is the same. Bcrypt also goes a step beyond most PBKDF2-based solutions by using the derived key to encrypt a well-known plain text. The resulting cipher text is stored as the "hash," along with some meta-data. However, nothing stops you from doing the same thing with PBKDF2.

Here are other answers I've written on this topic:

Solution 2

To those who say it's secure, they are correct in general. "Double" hashing (or the logical expansion of that, iterating a hash function) is absolutely secure if done right, for a specific concern.

To those who say it's insecure, they are correct in this case. The code that is posted in the question is insecure. Let's talk about why:

$hashed_password1 = md5( md5( plaintext_password ) );
$hashed_password2 = md5( plaintext_password );

There are two fundamental properties of a hash function that we're concerned about:

  1. Pre-Image Resistance - Given a hash $h, it should be difficult to find a message $m such that $h === hash($m)

  2. Second-Pre-Image Resistance - Given a message $m1, it should be difficult to find a different message $m2 such that hash($m1) === hash($m2)

  3. Collision Resistance - It should be difficult to find a pair of messages ($m1, $m2) such that hash($m1) === hash($m2) (note that this is similar to Second-Pre-Image resistance, but different in that here the attacker has control over both messages)...

For the storage of passwords, all we really care about is Pre-Image Resistance. The other two would be moot, because $m1 is the user's password we're trying to keep safe. So if the attacker already has it, the hash has nothing to protect...

DISCLAIMER

Everything that follows is based on the premise that all we care about is Pre-Image Resistance. The other two fundamental properties of hash functions may not (and typically don't) hold up in the same way. So the conclusions in this post are only applicable when using hash functions for the storage of passwords. They are not applicable in general...

Let's Get Started

For the sake of this discussion, let's invent our own hash function:

function ourHash($input) {
    $result = 0;
    for ($i = 0; $i < strlen($input); $i++) {
        $result += ord($input[$i]);
    }
    return (string) ($result % 256);
}

Now it should be pretty obvious what this hash function does. It sums together the ASCII values of each character of input, and then takes the modulo of that result with 256.

So let's test it out:

var_dump(
    ourHash('abc'), // string(2) "38"
    ourHash('def'), // string(2) "47"
    ourHash('hij'), // string(2) "59"
    ourHash('klm')  // string(2) "68"
);

Now, let's see what happens if we run it a few times around a function:

$tests = array(
    "abc",
    "def",
    "hij",
    "klm",
);

foreach ($tests as $test) {
    $hash = $test;
    for ($i = 0; $i < 100; $i++) {
        $hash = ourHash($hash);
    }
    echo "Hashing $test => $hash\n";
}

That outputs:

Hashing abc => 152
Hashing def => 152
Hashing hij => 155
Hashing klm => 155

Hrm, wow. We've generated collisions!!! Let's try to look at why:

Here's the output of hashing a string of each and every possible hash output:

Hashing 0 => 48
Hashing 1 => 49
Hashing 2 => 50
Hashing 3 => 51
Hashing 4 => 52
Hashing 5 => 53
Hashing 6 => 54
Hashing 7 => 55
Hashing 8 => 56
Hashing 9 => 57
Hashing 10 => 97
Hashing 11 => 98
Hashing 12 => 99
Hashing 13 => 100
Hashing 14 => 101
Hashing 15 => 102
Hashing 16 => 103
Hashing 17 => 104
Hashing 18 => 105
Hashing 19 => 106
Hashing 20 => 98
Hashing 21 => 99
Hashing 22 => 100
Hashing 23 => 101
Hashing 24 => 102
Hashing 25 => 103
Hashing 26 => 104
Hashing 27 => 105
Hashing 28 => 106
Hashing 29 => 107
Hashing 30 => 99
Hashing 31 => 100
Hashing 32 => 101
Hashing 33 => 102
Hashing 34 => 103
Hashing 35 => 104
Hashing 36 => 105
Hashing 37 => 106
Hashing 38 => 107
Hashing 39 => 108
Hashing 40 => 100
Hashing 41 => 101
Hashing 42 => 102
Hashing 43 => 103
Hashing 44 => 104
Hashing 45 => 105
Hashing 46 => 106
Hashing 47 => 107
Hashing 48 => 108
Hashing 49 => 109
Hashing 50 => 101
Hashing 51 => 102
Hashing 52 => 103
Hashing 53 => 104
Hashing 54 => 105
Hashing 55 => 106
Hashing 56 => 107
Hashing 57 => 108
Hashing 58 => 109
Hashing 59 => 110
Hashing 60 => 102
Hashing 61 => 103
Hashing 62 => 104
Hashing 63 => 105
Hashing 64 => 106
Hashing 65 => 107
Hashing 66 => 108
Hashing 67 => 109
Hashing 68 => 110
Hashing 69 => 111
Hashing 70 => 103
Hashing 71 => 104
Hashing 72 => 105
Hashing 73 => 106
Hashing 74 => 107
Hashing 75 => 108
Hashing 76 => 109
Hashing 77 => 110
Hashing 78 => 111
Hashing 79 => 112
Hashing 80 => 104
Hashing 81 => 105
Hashing 82 => 106
Hashing 83 => 107
Hashing 84 => 108
Hashing 85 => 109
Hashing 86 => 110
Hashing 87 => 111
Hashing 88 => 112
Hashing 89 => 113
Hashing 90 => 105
Hashing 91 => 106
Hashing 92 => 107
Hashing 93 => 108
Hashing 94 => 109
Hashing 95 => 110
Hashing 96 => 111
Hashing 97 => 112
Hashing 98 => 113
Hashing 99 => 114
Hashing 100 => 145
Hashing 101 => 146
Hashing 102 => 147
Hashing 103 => 148
Hashing 104 => 149
Hashing 105 => 150
Hashing 106 => 151
Hashing 107 => 152
Hashing 108 => 153
Hashing 109 => 154
Hashing 110 => 146
Hashing 111 => 147
Hashing 112 => 148
Hashing 113 => 149
Hashing 114 => 150
Hashing 115 => 151
Hashing 116 => 152
Hashing 117 => 153
Hashing 118 => 154
Hashing 119 => 155
Hashing 120 => 147
Hashing 121 => 148
Hashing 122 => 149
Hashing 123 => 150
Hashing 124 => 151
Hashing 125 => 152
Hashing 126 => 153
Hashing 127 => 154
Hashing 128 => 155
Hashing 129 => 156
Hashing 130 => 148
Hashing 131 => 149
Hashing 132 => 150
Hashing 133 => 151
Hashing 134 => 152
Hashing 135 => 153
Hashing 136 => 154
Hashing 137 => 155
Hashing 138 => 156
Hashing 139 => 157
Hashing 140 => 149
Hashing 141 => 150
Hashing 142 => 151
Hashing 143 => 152
Hashing 144 => 153
Hashing 145 => 154
Hashing 146 => 155
Hashing 147 => 156
Hashing 148 => 157
Hashing 149 => 158
Hashing 150 => 150
Hashing 151 => 151
Hashing 152 => 152
Hashing 153 => 153
Hashing 154 => 154
Hashing 155 => 155
Hashing 156 => 156
Hashing 157 => 157
Hashing 158 => 158
Hashing 159 => 159
Hashing 160 => 151
Hashing 161 => 152
Hashing 162 => 153
Hashing 163 => 154
Hashing 164 => 155
Hashing 165 => 156
Hashing 166 => 157
Hashing 167 => 158
Hashing 168 => 159
Hashing 169 => 160
Hashing 170 => 152
Hashing 171 => 153
Hashing 172 => 154
Hashing 173 => 155
Hashing 174 => 156
Hashing 175 => 157
Hashing 176 => 158
Hashing 177 => 159
Hashing 178 => 160
Hashing 179 => 161
Hashing 180 => 153
Hashing 181 => 154
Hashing 182 => 155
Hashing 183 => 156
Hashing 184 => 157
Hashing 185 => 158
Hashing 186 => 159
Hashing 187 => 160
Hashing 188 => 161
Hashing 189 => 162
Hashing 190 => 154
Hashing 191 => 155
Hashing 192 => 156
Hashing 193 => 157
Hashing 194 => 158
Hashing 195 => 159
Hashing 196 => 160
Hashing 197 => 161
Hashing 198 => 162
Hashing 199 => 163
Hashing 200 => 146
Hashing 201 => 147
Hashing 202 => 148
Hashing 203 => 149
Hashing 204 => 150
Hashing 205 => 151
Hashing 206 => 152
Hashing 207 => 153
Hashing 208 => 154
Hashing 209 => 155
Hashing 210 => 147
Hashing 211 => 148
Hashing 212 => 149
Hashing 213 => 150
Hashing 214 => 151
Hashing 215 => 152
Hashing 216 => 153
Hashing 217 => 154
Hashing 218 => 155
Hashing 219 => 156
Hashing 220 => 148
Hashing 221 => 149
Hashing 222 => 150
Hashing 223 => 151
Hashing 224 => 152
Hashing 225 => 153
Hashing 226 => 154
Hashing 227 => 155
Hashing 228 => 156
Hashing 229 => 157
Hashing 230 => 149
Hashing 231 => 150
Hashing 232 => 151
Hashing 233 => 152
Hashing 234 => 153
Hashing 235 => 154
Hashing 236 => 155
Hashing 237 => 156
Hashing 238 => 157
Hashing 239 => 158
Hashing 240 => 150
Hashing 241 => 151
Hashing 242 => 152
Hashing 243 => 153
Hashing 244 => 154
Hashing 245 => 155
Hashing 246 => 156
Hashing 247 => 157
Hashing 248 => 158
Hashing 249 => 159
Hashing 250 => 151
Hashing 251 => 152
Hashing 252 => 153
Hashing 253 => 154
Hashing 254 => 155
Hashing 255 => 156

Notice the tendency towards higher numbers. That turns out to be our deadfall. Running the hash 4 times ($hash = ourHash($hash)`, for each element) winds up giving us:

Hashing 0 => 153
Hashing 1 => 154
Hashing 2 => 155
Hashing 3 => 156
Hashing 4 => 157
Hashing 5 => 158
Hashing 6 => 150
Hashing 7 => 151
Hashing 8 => 152
Hashing 9 => 153
Hashing 10 => 157
Hashing 11 => 158
Hashing 12 => 150
Hashing 13 => 154
Hashing 14 => 155
Hashing 15 => 156
Hashing 16 => 157
Hashing 17 => 158
Hashing 18 => 150
Hashing 19 => 151
Hashing 20 => 158
Hashing 21 => 150
Hashing 22 => 154
Hashing 23 => 155
Hashing 24 => 156
Hashing 25 => 157
Hashing 26 => 158
Hashing 27 => 150
Hashing 28 => 151
Hashing 29 => 152
Hashing 30 => 150
Hashing 31 => 154
Hashing 32 => 155
Hashing 33 => 156
Hashing 34 => 157
Hashing 35 => 158
Hashing 36 => 150
Hashing 37 => 151
Hashing 38 => 152
Hashing 39 => 153
Hashing 40 => 154
Hashing 41 => 155
Hashing 42 => 156
Hashing 43 => 157
Hashing 44 => 158
Hashing 45 => 150
Hashing 46 => 151
Hashing 47 => 152
Hashing 48 => 153
Hashing 49 => 154
Hashing 50 => 155
Hashing 51 => 156
Hashing 52 => 157
Hashing 53 => 158
Hashing 54 => 150
Hashing 55 => 151
Hashing 56 => 152
Hashing 57 => 153
Hashing 58 => 154
Hashing 59 => 155
Hashing 60 => 156
Hashing 61 => 157
Hashing 62 => 158
Hashing 63 => 150
Hashing 64 => 151
Hashing 65 => 152
Hashing 66 => 153
Hashing 67 => 154
Hashing 68 => 155
Hashing 69 => 156
Hashing 70 => 157
Hashing 71 => 158
Hashing 72 => 150
Hashing 73 => 151
Hashing 74 => 152
Hashing 75 => 153
Hashing 76 => 154
Hashing 77 => 155
Hashing 78 => 156
Hashing 79 => 157
Hashing 80 => 158
Hashing 81 => 150
Hashing 82 => 151
Hashing 83 => 152
Hashing 84 => 153
Hashing 85 => 154
Hashing 86 => 155
Hashing 87 => 156
Hashing 88 => 157
Hashing 89 => 158
Hashing 90 => 150
Hashing 91 => 151
Hashing 92 => 152
Hashing 93 => 153
Hashing 94 => 154
Hashing 95 => 155
Hashing 96 => 156
Hashing 97 => 157
Hashing 98 => 158
Hashing 99 => 150
Hashing 100 => 154
Hashing 101 => 155
Hashing 102 => 156
Hashing 103 => 157
Hashing 104 => 158
Hashing 105 => 150
Hashing 106 => 151
Hashing 107 => 152
Hashing 108 => 153
Hashing 109 => 154
Hashing 110 => 155
Hashing 111 => 156
Hashing 112 => 157
Hashing 113 => 158
Hashing 114 => 150
Hashing 115 => 151
Hashing 116 => 152
Hashing 117 => 153
Hashing 118 => 154
Hashing 119 => 155
Hashing 120 => 156
Hashing 121 => 157
Hashing 122 => 158
Hashing 123 => 150
Hashing 124 => 151
Hashing 125 => 152
Hashing 126 => 153
Hashing 127 => 154
Hashing 128 => 155
Hashing 129 => 156
Hashing 130 => 157
Hashing 131 => 158
Hashing 132 => 150
Hashing 133 => 151
Hashing 134 => 152
Hashing 135 => 153
Hashing 136 => 154
Hashing 137 => 155
Hashing 138 => 156
Hashing 139 => 157
Hashing 140 => 158
Hashing 141 => 150
Hashing 142 => 151
Hashing 143 => 152
Hashing 144 => 153
Hashing 145 => 154
Hashing 146 => 155
Hashing 147 => 156
Hashing 148 => 157
Hashing 149 => 158
Hashing 150 => 150
Hashing 151 => 151
Hashing 152 => 152
Hashing 153 => 153
Hashing 154 => 154
Hashing 155 => 155
Hashing 156 => 156
Hashing 157 => 157
Hashing 158 => 158
Hashing 159 => 159
Hashing 160 => 151
Hashing 161 => 152
Hashing 162 => 153
Hashing 163 => 154
Hashing 164 => 155
Hashing 165 => 156
Hashing 166 => 157
Hashing 167 => 158
Hashing 168 => 159
Hashing 169 => 151
Hashing 170 => 152
Hashing 171 => 153
Hashing 172 => 154
Hashing 173 => 155
Hashing 174 => 156
Hashing 175 => 157
Hashing 176 => 158
Hashing 177 => 159
Hashing 178 => 151
Hashing 179 => 152
Hashing 180 => 153
Hashing 181 => 154
Hashing 182 => 155
Hashing 183 => 156
Hashing 184 => 157
Hashing 185 => 158
Hashing 186 => 159
Hashing 187 => 151
Hashing 188 => 152
Hashing 189 => 153
Hashing 190 => 154
Hashing 191 => 155
Hashing 192 => 156
Hashing 193 => 157
Hashing 194 => 158
Hashing 195 => 159
Hashing 196 => 151
Hashing 197 => 152
Hashing 198 => 153
Hashing 199 => 154
Hashing 200 => 155
Hashing 201 => 156
Hashing 202 => 157
Hashing 203 => 158
Hashing 204 => 150
Hashing 205 => 151
Hashing 206 => 152
Hashing 207 => 153
Hashing 208 => 154
Hashing 209 => 155
Hashing 210 => 156
Hashing 211 => 157
Hashing 212 => 158
Hashing 213 => 150
Hashing 214 => 151
Hashing 215 => 152
Hashing 216 => 153
Hashing 217 => 154
Hashing 218 => 155
Hashing 219 => 156
Hashing 220 => 157
Hashing 221 => 158
Hashing 222 => 150
Hashing 223 => 151
Hashing 224 => 152
Hashing 225 => 153
Hashing 226 => 154
Hashing 227 => 155
Hashing 228 => 156
Hashing 229 => 157
Hashing 230 => 158
Hashing 231 => 150
Hashing 232 => 151
Hashing 233 => 152
Hashing 234 => 153
Hashing 235 => 154
Hashing 236 => 155
Hashing 237 => 156
Hashing 238 => 157
Hashing 239 => 158
Hashing 240 => 150
Hashing 241 => 151
Hashing 242 => 152
Hashing 243 => 153
Hashing 244 => 154
Hashing 245 => 155
Hashing 246 => 156
Hashing 247 => 157
Hashing 248 => 158
Hashing 249 => 159
Hashing 250 => 151
Hashing 251 => 152
Hashing 252 => 153
Hashing 253 => 154
Hashing 254 => 155
Hashing 255 => 156

We've narrowed ourselves down to 8 values... That's bad... Our original function mapped S(∞) onto S(256). That is we've created a Surjective Function mapping $input to $output.

Since we have a Surjective function, we have no guarantee the mapping for any subset of the input won't have collisions (in fact, in practice they will).

That's what happened here! Our function was bad, but that's not why this worked (that's why it worked so quickly and so completely).

The same thing happens with MD5. It maps S(∞) onto S(2^128). Since there's no guarantee that running MD5(S(output)) will be Injective, meaning that it won't have collisions.

TL/DR Section

Therefore, since feeding the output back to md5 directly can generate collisions, every iteration will increase the chance of collisions. This is a linear increase however, which means that while the result set of 2^128 is reduced, it's not significantly reduced fast enough to be a critical flaw.

So,

$output = md5($input); // 2^128 possibilities
$output = md5($output); // < 2^128 possibilities
$output = md5($output); // < 2^128 possibilities
$output = md5($output); // < 2^128 possibilities
$output = md5($output); // < 2^128 possibilities

The more times you iterate, the further the reduction goes.

The Fix

Fortunately for us, there's a trivial way to fix this: Feed back something into the further iterations:

$output = md5($input); // 2^128 possibilities
$output = md5($input . $output); // 2^128 possibilities
$output = md5($input . $output); // 2^128 possibilities
$output = md5($input . $output); // 2^128 possibilities
$output = md5($input . $output); // 2^128 possibilities    

Note that the further iterations aren't 2^128 for each individual value for $input. Meaning that we may be able to generate $input values that still collide down the line (and hence will settle or resonate at far less than 2^128 possible outputs). But the general case for $input is still as strong as it was for a single round.

Wait, was it? Let's test this out with our ourHash() function. Switching to $hash = ourHash($input . $hash);, for 100 iterations:

Hashing 0 => 201
Hashing 1 => 212
Hashing 2 => 199
Hashing 3 => 201
Hashing 4 => 203
Hashing 5 => 205
Hashing 6 => 207
Hashing 7 => 209
Hashing 8 => 211
Hashing 9 => 204
Hashing 10 => 251
Hashing 11 => 147
Hashing 12 => 251
Hashing 13 => 148
Hashing 14 => 253
Hashing 15 => 0
Hashing 16 => 1
Hashing 17 => 2
Hashing 18 => 161
Hashing 19 => 163
Hashing 20 => 147
Hashing 21 => 251
Hashing 22 => 148
Hashing 23 => 253
Hashing 24 => 0
Hashing 25 => 1
Hashing 26 => 2
Hashing 27 => 161
Hashing 28 => 163
Hashing 29 => 8
Hashing 30 => 251
Hashing 31 => 148
Hashing 32 => 253
Hashing 33 => 0
Hashing 34 => 1
Hashing 35 => 2
Hashing 36 => 161
Hashing 37 => 163
Hashing 38 => 8
Hashing 39 => 4
Hashing 40 => 148
Hashing 41 => 253
Hashing 42 => 0
Hashing 43 => 1
Hashing 44 => 2
Hashing 45 => 161
Hashing 46 => 163
Hashing 47 => 8
Hashing 48 => 4
Hashing 49 => 9
Hashing 50 => 253
Hashing 51 => 0
Hashing 52 => 1
Hashing 53 => 2
Hashing 54 => 161
Hashing 55 => 163
Hashing 56 => 8
Hashing 57 => 4
Hashing 58 => 9
Hashing 59 => 11
Hashing 60 => 0
Hashing 61 => 1
Hashing 62 => 2
Hashing 63 => 161
Hashing 64 => 163
Hashing 65 => 8
Hashing 66 => 4
Hashing 67 => 9
Hashing 68 => 11
Hashing 69 => 4
Hashing 70 => 1
Hashing 71 => 2
Hashing 72 => 161
Hashing 73 => 163
Hashing 74 => 8
Hashing 75 => 4
Hashing 76 => 9
Hashing 77 => 11
Hashing 78 => 4
Hashing 79 => 3
Hashing 80 => 2
Hashing 81 => 161
Hashing 82 => 163
Hashing 83 => 8
Hashing 84 => 4
Hashing 85 => 9
Hashing 86 => 11
Hashing 87 => 4
Hashing 88 => 3
Hashing 89 => 17
Hashing 90 => 161
Hashing 91 => 163
Hashing 92 => 8
Hashing 93 => 4
Hashing 94 => 9
Hashing 95 => 11
Hashing 96 => 4
Hashing 97 => 3
Hashing 98 => 17
Hashing 99 => 13
Hashing 100 => 246
Hashing 101 => 248
Hashing 102 => 49
Hashing 103 => 44
Hashing 104 => 255
Hashing 105 => 198
Hashing 106 => 43
Hashing 107 => 51
Hashing 108 => 202
Hashing 109 => 2
Hashing 110 => 248
Hashing 111 => 49
Hashing 112 => 44
Hashing 113 => 255
Hashing 114 => 198
Hashing 115 => 43
Hashing 116 => 51
Hashing 117 => 202
Hashing 118 => 2
Hashing 119 => 51
Hashing 120 => 49
Hashing 121 => 44
Hashing 122 => 255
Hashing 123 => 198
Hashing 124 => 43
Hashing 125 => 51
Hashing 126 => 202
Hashing 127 => 2
Hashing 128 => 51
Hashing 129 => 53
Hashing 130 => 44
Hashing 131 => 255
Hashing 132 => 198
Hashing 133 => 43
Hashing 134 => 51
Hashing 135 => 202
Hashing 136 => 2
Hashing 137 => 51
Hashing 138 => 53
Hashing 139 => 55
Hashing 140 => 255
Hashing 141 => 198
Hashing 142 => 43
Hashing 143 => 51
Hashing 144 => 202
Hashing 145 => 2
Hashing 146 => 51
Hashing 147 => 53
Hashing 148 => 55
Hashing 149 => 58
Hashing 150 => 198
Hashing 151 => 43
Hashing 152 => 51
Hashing 153 => 202
Hashing 154 => 2
Hashing 155 => 51
Hashing 156 => 53
Hashing 157 => 55
Hashing 158 => 58
Hashing 159 => 0
Hashing 160 => 43
Hashing 161 => 51
Hashing 162 => 202
Hashing 163 => 2
Hashing 164 => 51
Hashing 165 => 53
Hashing 166 => 55
Hashing 167 => 58
Hashing 168 => 0
Hashing 169 => 209
Hashing 170 => 51
Hashing 171 => 202
Hashing 172 => 2
Hashing 173 => 51
Hashing 174 => 53
Hashing 175 => 55
Hashing 176 => 58
Hashing 177 => 0
Hashing 178 => 209
Hashing 179 => 216
Hashing 180 => 202
Hashing 181 => 2
Hashing 182 => 51
Hashing 183 => 53
Hashing 184 => 55
Hashing 185 => 58
Hashing 186 => 0
Hashing 187 => 209
Hashing 188 => 216
Hashing 189 => 219
Hashing 190 => 2
Hashing 191 => 51
Hashing 192 => 53
Hashing 193 => 55
Hashing 194 => 58
Hashing 195 => 0
Hashing 196 => 209
Hashing 197 => 216
Hashing 198 => 219
Hashing 199 => 220
Hashing 200 => 248
Hashing 201 => 49
Hashing 202 => 44
Hashing 203 => 255
Hashing 204 => 198
Hashing 205 => 43
Hashing 206 => 51
Hashing 207 => 202
Hashing 208 => 2
Hashing 209 => 51
Hashing 210 => 49
Hashing 211 => 44
Hashing 212 => 255
Hashing 213 => 198
Hashing 214 => 43
Hashing 215 => 51
Hashing 216 => 202
Hashing 217 => 2
Hashing 218 => 51
Hashing 219 => 53
Hashing 220 => 44
Hashing 221 => 255
Hashing 222 => 198
Hashing 223 => 43
Hashing 224 => 51
Hashing 225 => 202
Hashing 226 => 2
Hashing 227 => 51
Hashing 228 => 53
Hashing 229 => 55
Hashing 230 => 255
Hashing 231 => 198
Hashing 232 => 43
Hashing 233 => 51
Hashing 234 => 202
Hashing 235 => 2
Hashing 236 => 51
Hashing 237 => 53
Hashing 238 => 55
Hashing 239 => 58
Hashing 240 => 198
Hashing 241 => 43
Hashing 242 => 51
Hashing 243 => 202
Hashing 244 => 2
Hashing 245 => 51
Hashing 246 => 53
Hashing 247 => 55
Hashing 248 => 58
Hashing 249 => 0
Hashing 250 => 43
Hashing 251 => 51
Hashing 252 => 202
Hashing 253 => 2
Hashing 254 => 51
Hashing 255 => 53

There's still a rough pattern there, but note that it's no more of a pattern than our underlying function (which was already quite weak).

Notice however that 0 and 3 became collisions, even though they weren't in the single run. That's an application of what I said before (that the collision resistance stays the same for the set of all inputs, but specific collision routes may open up due to flaws in the underlying algorithm).

TL/DR Section

By feeding back the input into each iteration, we effectively break any collisions that may have occurred in the prior iteration.

Therefore, md5($input . md5($input)); should be (theoretically at least) as strong as md5($input).

Is This Important?

Yes. This is one of the reasons that PBKDF2 replaced PBKDF1 in RFC 2898. Consider the inner loops of the two::

PBKDF1:

T_1 = Hash (P || S) ,
T_2 = Hash (T_1) ,
...
T_c = Hash (T_{c-1}) 

Where c is the iteration count, P is the Password and S is the salt

PBKDF2:

U_1 = PRF (P, S || INT (i)) ,
U_2 = PRF (P, U_1) ,
...
U_c = PRF (P, U_{c-1})

Where PRF is really just a HMAC. But for our purposes here, let's just say that PRF(P, S) = Hash(P || S) (that is, the PRF of 2 inputs is the same, roughly speaking, as hash with the two concatenated together). It's very much not, but for our purposes it is.

So PBKDF2 maintains the collision resistance of the underlying Hash function, where PBKDF1 does not.

Tie-ing All Of It Together:

We know of secure ways of iterating a hash. In fact:

$hash = $input;
$i = 10000;
do {
   $hash = hash($input . $hash);
} while ($i-- > 0);

Is typically safe.

Now, to go into why we would want to hash it, let's analyze the entropy movement.

A hash takes in the infinite set: S(∞) and produces a smaller, consistently sized set S(n). The next iteration (assuming the input is passed back in) maps S(∞) onto S(n) again:

S(∞) -> S(n)
S(∞) -> S(n)
S(∞) -> S(n)
S(∞) -> S(n)
S(∞) -> S(n)
S(∞) -> S(n)

Notice that the final output has exactly the same amount of entropy as the first one. Iterating will not "make it more obscured". The entropy is identical. There's no magic source of unpredictability (it's a Pseudo-Random-Function, not a Random Function).

There is however a gain to iterating. It makes the hashing process artificially slower. And that's why iterating can be a good idea. In fact, it's the basic principle of most modern password hashing algorithms (the fact that doing something over-and-over makes it slower).

Slow is good, because it's combating the primary security threat: brute forcing. The slower we make our hashing algorithm, the harder attackers have to work to attack password hashes stolen from us. And that's a good thing!!!

Solution 3

Yes, re-hashing reduces the search space, but no, it doesn't matter - the effective reduction is insignificant.

Re-hashing increases the time it takes to brute-force, but doing so only twice is also suboptimal.

What you really want is to hash the password with PBKDF2 - a proven method of using a secure hash with salt and iterations. Check out this SO response.

EDIT: I almost forgot - DON'T USE MD5!!!! Use a modern cryptographic hash such as the SHA-2 family (SHA-256, SHA-384, and SHA-512).

Solution 4

Yes - it reduces the number of possibly strings that match the string.

As you have already mentioned, salted hashes are much better.

An article here: http://websecurity.ro/blog/2007/11/02/md5md5-vs-md5/, attempts a proof at why it is equivalent, but I'm not sure with the logic. Partly they assume that there isn't software available to analyse md5(md5(text)), but obviously it's fairly trivial to produce the rainbow tables.

I'm still sticking with my answer that there are smaller number of md5(md5(text)) type hashes than md5(text) hashes, increasing the chance of collision (even if still to an unlikely probability) and reducing the search space.

Solution 5

Most answers are by people without a background in cryptography or security. And they are wrong. Use a salt, if possible unique per record. MD5/SHA/etc are too fast, the opposite of what you want. PBKDF2 and bcrypt are slower (wich is good) but can be defeated with ASICs/FPGA/GPUs (very afordable nowadays). So a memory-hard algorithm is needed: enter scrypt.

Here's a layman explanation on salts and speed (but not about memory-hard algorithms).

Share:
37
Bardock
Author by

Bardock

Updated on July 08, 2022

Comments

  • Bardock
    Bardock almost 2 years

    i have this php file (stripped down to the essential part)

    getBoundaries.php

    <?php
            $t = "";
            //$t = "t";
    
            Header("Content-type: text/plain; charset=utf-8");
            echo ($t);
     ?>
    

    and this javascript ajax call:

    function anyname(){
        var xhttp;
        var query = "tid=1&pid=1";
        xhttp = new XMLHttpRequest();
        xhttp.onreadystatechange = function() {
            if (xhttp.readyState == 4 && xhttp.status == 200) {
                var temp = xhttp.responseText;
                if (temp == ""){
                    console.log("ein leerer string");
                }
                else{
                    for (i = 0; i < temp.length; i++){
                        console.log(temp.charCodeAt(i));
                    }
                }
            }
        }
        xhttp.open("POST", "getBoundaries.php", true);
        xhttp.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
        xhttp.send(query);
    }
    

    The output of console.log with $t="" were 32, 10, 32.

    The output of console.log with $t="t" were 116, 32, 10, 32.

    So, my question is: WHY isn't an empty string returned as responseText? Where are these additional 3 characters added? My assumption: the echo php-command adds those three chars.

    Any suggestions?

    • Mihai Matei
      Mihai Matei over 8 years
      Maybe you have some empty spaces after ?> (closing tag)
    • Mihai Matei
      Mihai Matei over 8 years
      Also, please note that the closing tag is not mandatory. Of course.. it must be used if PHP and HTML code is used on the same page..
    • Jaromanda X
      Jaromanda X over 8 years
      looks more like a space after ?> then a new line with a space in it
  • Greg
    Greg over 15 years
    It is less secure mathematically, and you'd be much better off using a sleep() than intentionally making a slow algorithm
  • SquareCog
    SquareCog over 15 years
    All correct, but I just want to note that the effect of the strong collision resistance compromise on MD5 is blown a bit out of proportion -- most scenarios that use crypto hash functions do not rely on strong collision resistance, just weak resistance. They are not affected by this vulnerability.
  • Bill the Lizard
    Bill the Lizard over 15 years
    Actually, it should be a string randomly generated for each user, not a constant.
  • SquareCog
    SquareCog over 15 years
    Number of arbitrary strings: infinity. number of possible MD5 values: 2^128. Sleep works because they are using your server to authenticate in RoBorg's scenario. Now, number of likely passwords is much lower than 2^128, and so is number of derived Md5s, so that argument is questionable.
  • Admin
    Admin over 15 years
    Intentionally making a slow algorithm is an accepted practice when you're trying to prevent dictionary attacks against compromised authentication stores. The technique is called "key strengthening" or "key stretching". See en.wikipedia.org/wiki/Key_stretching
  • Admin
    Admin over 15 years
    I mentioned this in another comment too, but en.wikipedia.org/wiki/Key_stretching
  • orip
    orip over 15 years
    @RoBorg: it doesn't matter how slow your implementation is, but how slow an attacker's implementation will be: if the hash itself is thousands of times slower, it will take an attacker thousands of times as long to brute-force the password.
  • SquareCog
    SquareCog over 15 years
    A constant secret works (and is easier to work with), if you throw in the username as suggested. That essentially produces a random user-specific key.
  • orip
    orip over 15 years
    Re-hashing is precisely about slowing down the hash. This is a key security feature in password-based cryptography. See the links for PCKS5 and PBKDF2.
  • orip
    orip over 15 years
    @erickson - with PBKDF2 you increase the number of iterations to increase the processing time, so that isn't a special bcrypt property. Bcrypt, on the other hand, hasn't received nearly as much cryptological review as HMAC and the SHA-2 hashes have.
  • Bill the Lizard
    Bill the Lizard over 15 years
    A constant secret salt is security through obscurity. If the "secret" gets out that you're using "xxx" + username + password, then an attacker doesn't even need data from your tables to launch an attack against it.
  • Bill the Lizard
    Bill the Lizard over 15 years
    This doesn't answer the question.
  • FryGuy
    FryGuy over 15 years
    I don't think that it's security through obscurity. The reason for using a salt is that you can't compute a rainbow table against multiple md5 hashes simultaneously. Building one for "xxx"+password (same salt) happens once. Building a table for "xxx"+username+password is worse than brute forcing.
  • Bill the Lizard
    Bill the Lizard over 15 years
    @FryGuy: usernames aren't secret. If the "secret" salt xxx is used, the attack is reduced to building one dictionary to attack a specific username. If you use random salts then your database would have to be compromised before the same attack would be possible.
  • jmucchiello
    jmucchiello about 15 years
    Arguably you would want collisions within the 128-bit space 0 through 2^128-1. If the hash algorithm's 2^128 output space is perfect, then theoretically, you just have a substitution cipher with an alphabet of 2^128 glyphs.
  • brady
    brady about 15 years
    Substitution ciphers are weak because frequency distributions in the plaintext are propagated to the ciphertext. That wouldn't apply here, where every plaintext "character" is equally likely.
  • Sam Saffron
    Sam Saffron about 15 years
    The chances of you coming across an md5 collision by accident is the same as winning the intergalactic lottery, its just not going to happen. stackoverflow.com/questions/800685/…
  • MbaiMburu
    MbaiMburu almost 15 years
    Wouldn't it be true that if MD5(K) = [X byte result] then MD5(MD5(K)) is LESS secure (more prone to collision) when your K is > [X bytes], and equally secure (i.e. redundant/pointless) when K =< [X Bytes]? Restricting the amount of times a user can retry a password is a CRITICAL security measure, but if you are increasing your CPU usage 1000 fold for every user, then you solution is ATROCIOUSLY unscalable.
  • brady
    brady almost 15 years
    @devin -- it's not "my solution", it's a widely accepted practice, built into password-based cryptography standards like PKCS #5, and recommended by experts like Robert Morris. It's extremely scalable, as the fraction of time spent authenticating users is small in a legitimate application. It only becomes difficult to scale when your application is cracking passwords—hence the recommendation. Certainly, the search space of a hash is smaller than that of possible passwords, but even a 128-bit space is too huge to brute-force search. The threat to defend against is an offline dictionary attack.
  • MbaiMburu
    MbaiMburu almost 15 years
    I was referring not to the inconvenience to the individual user, but rather the stress that would be put upon the server if you had a large user base, because you are relying on the CPU load to slow down the number of requests. It means that if you add more CPU power, you are reducing the restriction on those brute force attackers. -- However, you are completely correct about the scalability, and the widely accepted practice. I was wrong about almost all the things I said in my earlier comments. Sorry :)
  • Kornel
    Kornel almost 15 years
    @Bill the Lizard: "the attack is reduced to building one dictionary to attack a specific username" is just a brute-force attack (actually even worse, because in addition to computing all hashes you have to store them), so the salt works perfectly in this case.
  • user2633501
    user2633501 almost 15 years
    Wow, I'm starting to feel a wee bit naive/ignorant on this topic! What the heck is a rainbow table? lol I'm finding these comments a bit hard to follow, not sure what we're assuming the hacker has access to and how they're attacking. I guess I need to do some more research on this! :)
  • PP.
    PP. over 13 years
    @Ben Daniel, a rainbow table is a precomputed list of known hashes and what plaintext resulted in those hashes.
  • cHao
    cHao over 12 years
    For reference, salt is meant to defend against rainbow tables, not dictionary attacks. There's actually very little you can do to defend against dictionary attacks except (1) put some constraints on the password before letting a user set it (for example, "must not appear in my list of common words"), and/or (2) make the hashed password take a very long time to compute, which is the point of multiple rounds of hashing.
  • brady
    brady over 12 years
    @cHao: I meant pre-computed dictionary, in the sense of an indexed table. "Rainbow table" is just one specific implementation of this general class of attack.
  • Paŭlo Ebermann
    Paŭlo Ebermann over 12 years
    Note that PBKDF2 re-input the password and salt in each step to avoid losing entropy through repeated hashes. MD5 is most likely not injective on the 128-bit input space (i.e. has collisions and some output values which are not reached) - hash functions model random functions, not random permutations.
  • DFTR
    DFTR about 12 years
    Ultimately much of this doesn't matter for the question, since he shouldn't be using MD5. Should be using some higher SHA or BCrypt
  • orip
    orip about 12 years
    @DFTR - agreed. bcrypt or scrypt are better options.
  • brady
    brady over 11 years
    @hobbs bcrypt and PBKDF2 were always in there; hopefully they are more prominent now. The point of this answer is that hashing once is stupid.
  • hobbs
    hobbs over 11 years
    @erickson and with that, I agree. :)
  • CodesInChaos
    CodesInChaos almost 11 years
    @hobbs The difference between an iterated hash and PBKDF2 is negligible. You lose entropy, but it's easy to show that for an ideal hash that loss is quite small. So your language against simple iterated hashes is too strong. It's not best-practice, but it isn't a practical concern either.
  • zerkms
    zerkms almost 11 years
    $output = md5($output); // < 2^128 possibilities --- is it really strict <, or <=?
  • ircmaxell
    ircmaxell almost 11 years
    @zerkms: It's not strictly anything. We'd need to know some very specific details of the underlying function (md5() in this case) to really know for sure. But in general it will be < and not <=... Remember, we're talking about the size of the set of $output for all possible $inputs. So if we have even one collision it will be <, therefore < is the better generalizer.
  • ircmaxell
    ircmaxell almost 11 years
    @porneL: the problem with that argument is that it assumes that brute-forcing is going to be slow... So no, you don't have to store those hashes. And considering current attack rates for md5() are around 180 billion per second, even your secure 9 character password will fall in approximately 20 days (different cases, numbers and symbols, to a 50% chance of hitting it). The reason not to use "usernames", is that another site may be as well, and then an attacker could amortize the attacking against both hashes at the same time (assuming you're using a different pass on each site)... Random FTW
  • Kornel
    Kornel almost 11 years
    @ircmaxell attacker could amortize search only if hashes from different sites have same common prefix, so if the "xxx" part is unique, it'll foil that. And yes, MD5 is fast, but question was about single vs double hashing and that alone doesn't change order of magnitude of computation needed, unlike e.g. proper key stretching.
  • zerkms
    zerkms almost 11 years
    "So if we have even one collision it will be <" --- there is no math proof for that :-)
  • ircmaxell
    ircmaxell almost 11 years
    @zerkms: yes there is. If there is even one collision it ceases to be a Injective Function. And you require a purely Injective function for = (as every input in S(2^128) would need to map to exactly one output in S(2^128)). But having one collision means it's no longer Injective, and hence the output space by mandate becomes smaller than the input space (by the very definition of the function itself)...
  • zerkms
    zerkms almost 11 years
    "If there is even one collision" --- so how would you know there is one?
  • Tim
    Tim over 10 years
    Thanks for this insight! However, I am wondering if this specific collision also applies for hash functions where the output size is the same size as the internal state, for example: is running SHA512 twice not the same as adding more rounds to the hash functions, and therefore should be ok because the surjective mapping does not occur?
  • ircmaxell
    ircmaxell over 10 years
    @Tim: the internal state doesn't matter to this analysis. The only states that matter are input (all less than 2^128-1) and the output (2^10). So yes, this still applies to SHA-512...
  • Tomáš Fejfar
    Tomáš Fejfar about 10 years
    zerkms: It's called Birthday problem. There is 2^128 different md5 hashes (that's fact - based on combinatorics). So if you pick 2^128 different inputs, all of them generating unique hash you would use all of the possible hashes. So if you picked 2^128+1 you would be left with the one and all hashes used => the one will have same hash as one of the 2^128. Therefore there is at least one collision. By mathematical induction you can prove, that there is actually infinite number of collisions.
  • ircmaxell
    ircmaxell about 10 years
    @TomášFejfar I think the question is not about collisions in general, but collisions in the strict output set (2^128 outputs, each exactly 128 bits wide). That could be Injective, but as far as I know a generic proof is not possible (only a proof-by-example of a collision for a specific algorithm). Consider the hash function that simply returns the input if it is 128 bits (and hashes otherwise). In general it would be surjective, but when fed its output it would always be injective... That's the point of contention...
  • Jakub Vrána
    Jakub Vrána about 10 years
    Can you elaborate what approximately would be the output size of 1000 iterations of $input = MD5($input)? For one iteration, it's 16^32 ~ 3e38. If MD5 behaves like a random function then for 1000 iterations, it should be around 7e35.
  • Sz.
    Sz. almost 10 years
    @CodesInChaos, "You lose entropy..." Just to make sure I understand correctly: do we lose entropy during the iterations because the hash functions are (known to be) not perfect? Thanks!
  • CodesInChaos
    CodesInChaos almost 10 years
    @lunakid A hash function is not a permutation. So even for ideal hashes there are collisions which cost entropy.
  • Sz.
    Sz. almost 10 years
    @CodesInChaos, I'm not sure what you mean by "ideal hash", but a "perfect hash" (en.wikipedia.org/wiki/Perfect_hash_function) maps an input set S onto the same S, and is a permutation (I think), with no collisions. What I was wondering about: can a cryptographic hash (which is not perfect, as it maps "infinity" onto S, to begin with) behave as "perfect", when its input set is narrowed to be the same as its output -- which is the case with repeated hashing --, or that is impossible (which needs some proof then, at least I'm not smart enough to just "see" that).
  • CodesInChaos
    CodesInChaos almost 10 years
    @lunakid If your function is bijective, it's a permutation and violates the properties we require out of a cryptographic hash. So we don't expect commonly used hashes to be bijective, see Is SHA-512 bijective when hashing a single 512-bit block? on crypto.SE for one variant of this.
  • theodore hogberg
    theodore hogberg over 9 years
    Don't use those either (SHA-2 family) they can now also be cracked easily, check crackstation.net for proof. If anything use scrypt or PBKDF2 which are key derivation function (KDFs) based cryptographic hash functions.
  • Mark
    Mark over 9 years
    +1 I was waiting to see an answer like this one, because I thought of the same scenario where you don't want to store plain-text password on the client, but also not send the final encrypted password over the wire to do a simple comparison to the DB.
  • Pacerier
    Pacerier over 9 years
    @ircmaxell, +1 for the effort and elaborateness, but parts of this answer is very very wrong / misleading. There is exactly zero proof that md5($input . md5($input)) maps to S(n) and it's not even theoretically correct. The theoretically correct equation is md5(∞ . md5($input)) = S(n) and since $input is far from infinity there is simply no way to prove that md5($input . md5($input)) has more entropy than md5(md5($input)). Both equations are equally bad and impossible to be proved to map to S(n). I would hope you revise the answer since it's so highly voted.
  • Dan Bechard
    Dan Bechard over 9 years
    $hash = hash($input . $hash); Wouldn't brute forcing a single iteration of the stored hash give you e.g. "MyPassword5f4dcc3b5aa765d61d8327deb882cf99"? Why bother going further when I can just read the original input from the prefix of that string? You need to use something other than the original input or this whole exercise is futile.
  • ircmaxell
    ircmaxell over 9 years
  • user999305
    user999305 over 9 years
    Don't use MD5 at all.
  • jeromej
    jeromej about 9 years
    For those who would like to save time by not having to go check how that discussion between Dan & ircmaxell finished, it finished well: Dan agreeing with ircmaxell.
  • Bardock
    Bardock over 8 years
    Thank you as well for your suggestion. Removing remaining lines / spaces after the closing ?> tag did the trick.
  • silkfire
    silkfire over 7 years
    In 2016, Argon2 and scrypt are the ones everyone should strive to use
  • Daniel Howard
    Daniel Howard over 7 years
    Is it assumed in all of this that the attacker knows the hashing algorithm? Is it true that repeating a hashing function n times makes things more difficult for the attacker if they don't know the value of n?
  • brady
    brady over 7 years
    @DanielHoward Yes, assume the attacker knows everything except the password. Otherwise, you overestimate the security. Not knowing the value of n is not a significant obstacle for the attacker; they just keep iterating until they get a match. It's like a while loop instead of a for loop.
  • Clement Cherlin
    Clement Cherlin almost 5 years
    This answer is wrong in every respect. 1. Knowing the next-to-last hash provides no value to an attacker, because the input to an iterated hash is the password, which is then hashed many times (not once). 2. The input space is passwords, the output space is hashed passwords. The space of typical passwords is much smaller than the output space. 3. Rainbow tables for unsalted double-hashed passwords are no bigger than rainbow tables for unsalted single-hashed passwords. 4. Usernames are low-entropy, a good salt is random. 5. Salting does not replace iteration. You need both.
  • Clement Cherlin
    Clement Cherlin almost 5 years
    Doesn't help for web apps. if your server is compromised, the code your server is sending to the client is also compromised. The attacker will disable your client-side hash and capture raw passwords.