Encryption with AES-256 and the Initialization Vector

42,115

Solution 1

The IV should not be generated from the password alone.

The point of the IV that even with the same key and plaintext is re-used, a different ciphertext will be produced. If the IV is deterministically produced from the password only, you'd get the same ciphertext every time. In the cited example, a salt is randomly chosen, so a new key is generated even with the same password.

Just use a random number generator to choose an IV. That's what the cipher is doing internally.


I want to stress that you have to store either the IV (if you use the first method) or a salt (if you use the second method) together with the ciphertext. You won't have good security if everything is derived from the password; you need some randomness in every message.

Solution 2

My understanding is that Initialization Vector is just random input to encryption algorithm, otherwise you would always get same result for same input. Initialization Vector is stored together with cipher text, it's not secret in any way. Just use secure random function to generate initialization vector. PBKDF* algorithms are used to derive secret keys of desired length for encryption algorithms from user-entered passwords.

First implementation that you link to simply lets Cipher object to generate Initialization Vector. Then it fetches this generated IV to store it together with cipher text.

Second one uses part of hash bytes. Any approach that generates non-repeating IVs is good enough.

Most important property of IV is that it doesn't repeat (very often).

Solution 3

Cryptographers should generate IVs using a secure pseudo-random random number generator.

Application developers should use existing, off the shelf cryptography. I suggest that you use SSL with certificates to secure your network traffic and GPG to secure file data.

There are so many details that can make an implementation insecure, such as timing attacks. When an application developer is making decisions between AES 128 and AES 256 it is nearly always pointless since you've likely left a timing attack that renders the extra key bits useless.

Solution 4

The IV is just a consequence of the use of block chaining. I presume that this is more than a simple API design question. I assume that you know that the reasoning for using it is so that the same plaintext will not show up as the same ciphertext in multiple blocks.

Think about recursion from the last block where the Nth ciphertext block depends in some way on the (N-1)th block, etc. When you get to the first block, 0th block, you need some data to get started. It doesn't matter what that data is as long as you know it before you attempt to decrypt. Using non-secret random data as an initialization vector will cause identical messages encrypted under the same key to come out as completely different ciphertext.

It's similar in concept to salting a hash. And that source code looks a little fishy to me. An IV should simply be fresh-at-encryption-time random bits dependent upon nothing, like a nonce. The IV is basically part of the encrypted message. If you re-encrypt identical data with identical key, you should not be able to correlate the messages. (Hey, think about the consequences of correlating by ciphertext length as well.)

Solution 5

As with everyone else here, I've always known IVs to be just chosen randomly using the standard algorithms for doing so.

The second reference you provided, though, doesn't seem to be doing that. Looks like he salts a password and hashes it. Then takes that hash and splits it up into halves. One half is the encryption key, one is the IV. So the IV is derived from the password.

I don't have any strong breaks for such a method, but it's bad design. The IV should be independent and random all on its own. Maybe if there's a weakness in the hashing algorithm or if you choose a weak password. You don't want to be able to derive the IV from anything else or it's conceivable to launch pre-computation attacks.

Share:
42,115
MysteryMoose
Author by

MysteryMoose

Updated on December 22, 2020

Comments

  • MysteryMoose
    MysteryMoose over 3 years

    I have a question relating to the use of an Initialization Vector in AES encryption. I am referencing the following articles / posts to build encryption into my program:

    [1] Java 256-bit AES Password-Based Encryption
    [2] http://gmailassistant.sourceforge.net/src/org/freeshell/zs/common/Encryptor.java.html

    I was originally following erickson's solution from the first link but, from what I can tell, PBKDF2WithHmacSHA1 is not supported on my implementation. So, I turned to the second link to get an idea for my own iterative SHA-256 hash creation.

    My question comes in how the IV is created. One implementation ([1]) uses methods from the Cypher class to derive the IV where are the other ([2]) uses the second 16 bytes of the hash as the IV. Quite simply, why the difference and which is better from a security standpoint? I am kinda confused to the derivation and use of IVs as well (I understand what they are used for, just not the subtler differences), so any clarification is also very welcome.

    I noticed that the second link uses AES-128 rather than AES-256 which would suggest to me that I would have to go up to SHA-512 is I wanted to use this method. This seems like it would be an unfortunate requirement as the user's password would have to be 16 characters longer to ensure a remotely secure hash and this app is destined for a cell phone.

    Source is available on request, though it is still incomplete.

    Thank you in advance.

  • MysteryMoose
    MysteryMoose over 13 years
    I didn't mention this in the original post, but this is not for network traffic of any sort. I am going to be encrypting a file that will never leave the device unless the entire system is compromised.
  • MysteryMoose
    MysteryMoose over 13 years
    I am assuming that I will have to store the IV unless I derive it from the original key somehow? So the first implementation is actually more similar to the second than I thought in that they both (somehow) get an IV from a given key? In that case couldn't I just use the key as the IV?
  • MysteryMoose
    MysteryMoose over 13 years
    That bit makes sense. My confusion comes in generating an IV to make the best encryption possible (within practicality).
  • AltF4
    AltF4 over 13 years
    Well, the key has to remain secret. The IV is public. You don't want to use the key as your IV, or you'll be broadcasting your secret key to everyone! For the same reason, but to a lesser extent, this is why you don't want to have your IV based on the key using hashing. (Like [2] does) Since if the hashing algorithm has a weakness discovered in it, an attacker can derive your key from the IVs.
  • MysteryMoose
    MysteryMoose over 13 years
    Ok, that makes sense. So [2] is really just bad practice.
  • MysteryMoose
    MysteryMoose over 13 years
    Ok, that makes more sense. One more question, then: If Cypher is using a random number generator internally, how do I decrypt? As I understand it, the same IV needs to be used for both encryption and decryption. Do they simply use the key as the seed for the generator to get the same IV per key or is there something more going on?
  • brady
    brady over 13 years
    @phobos51594 - You have to store the IV with the ciphertext. One way to think of the IV is as the first block of ciphertext, so you could store it as a prefix to the actual output of the cipher. Or whatever. It doesn't need to be kept secret; just store it with the rest of the ciphertext.
  • MysteryMoose
    MysteryMoose over 13 years
    Ignore the previous comment, I just saw your edit. That makes even more sense. So the pattern would go something along the lines of: Hash password, generate IV using Cypher or a secure random feature, encrypt with hash-IV-salt, store previous values, decrypt with stored values, discard IV, celebrate, repeat.
  • Rob
    Rob over 13 years
    Trying to get around the burden of transmitting the IV with the encrypted data defeats the purpose of having an IV. Using an IV that's derived from a secret is missing the point, it defeats the purpose of it actually. The point is that the IV is fresh and unpredictable, so that previously encrypted messages cannot be correlated.
  • Peter Štibraný
    Peter Štibraný over 13 years
    sometimes new IV is computed as (previous IV) + 1, so it can be predictable. I am not sure if this introduces some problems or not though.
  • Peter Štibraný
    Peter Štibraný over 13 years
    @phobos51594: It is certainly bit strange... [2] uses key + salt to generate hash, and extracts IV from hash. Now the problem of randomness depends on salt. If someone passes same salt each time, they will get same IV, which is a problem. Problem of [2] is that now you have to remember salt, as well as IV :-)
  • Spike Gronim
    Spike Gronim over 13 years
    so you're putting an encryption key and an encrypted file on the same device? That's pointless.
  • AltF4
    AltF4 over 13 years
    In [2], since the IV is derived entirely from the salt+password you wouldn't need to store the IV. You could just store the salt and combine it with the password at runtime to generate the IV. But this design makes the strength of the IV tantamount to the strength of the salt. And I don't think I see in the code any length guarantees of the salt. It could be very small depending on how its used.
  • Peter Štibraný
    Peter Štibraný over 13 years
    @AltF4: yes, you are right. If you remember the salt, you can derive IV.
  • Rob
    Rob over 13 years
    @Peter, I see what you mean. But you still need to transmit the IV as part of the message, implicitly or explicitly, even if this is only done on the first of a series of encryptions in a session. However, think for a moment about giving this advice in the absence of deep details on the particular cipher. With predictable IVs, the chances of being able to correlate messages would be certainly higher (or the same) than for random IVs.
  • Rob
    Rob over 13 years
    Puzzle: 2 instances of my cipher with the same key, using IVs that start at 0 for each instance and increment. If I always encoded the same small number of messages, there would be correlations. If there were only one instance of it, then incrementing maximizes the period until it's known to repeat. So I guess the determining factor would be the probability of generating a collision if you used randomly generated IVs.
  • bob
    bob almost 13 years
    As phobos51594 noted, link [1] didn't work on Android due to "PBKDF2WithHmacSHA1" algorithm causing NoSuchAlgorithmException but "PBEWITHSHA256AND128BITAES-CBC-BC" seems working ok.
  • Ch'marr
    Ch'marr almost 11 years
    @Rob. Using a IV that simply increments per block is a known and currently-known-to-be-secure method. See the "Counter" encryption mode here: en.wikipedia.org/wiki/Block_cipher_mode_of_operation
  • Keynan
    Keynan over 10 years
    Are timing attacks really an issue in networked applications. I would assume that network speed variability would wipe out that information. If not a simple sleep until Xms past encrypt start would fix this.