Is a base64 encoded string unique?

19,602

Two years late, but here we go:

The short answer is yes, unique binary/hex values will always encode to a unique base64 encoded string.

BUT, multiple base64 encoded strings may represent a single binary/hex value.

This is because hex bytes are not aligned with base64 'digits'. A single hex byte is represented by 8 bits while a single base64 digit is represented by 6 bits. Therefore, any hex value that is not 6-bit aligned can have multiple base64 representations (though correctly implemented base64 encoders should encode to the same base64 representation).

An example of this misalignment is the hex value '0x433356c1'. This value is represented by 32-bits and base64 encodes into 'QzNWwQ=='. This 32-bit value, however, is not 6-bit aligned. So what happens? The base64 encoder pads four zero bits onto the end of the binary representation in this case to make the sequence 36-bits and consequently 6-bit aligned.

When decoding, the base64 decoder now has to decode into an 8-bit aligned value. It truncates the padded bits and decodes the first 32 bits into a hex value. For example, 'QzNWwc==' and 'QzNWwQ==' are different base64 encoded strings, but decode to the same hex value, 0x433356c1. If we look carefully, we notice that the first 32 bits are the same for both of these encoded strings:

'QzNWwc==':
010000 110011 001101 010110 110000 011100

'QzNWwQ==':
010000 110011 001101 010110 110000 010000

The only difference is the last four bits, which are ignored. Keep in mind that no base64 encoder should ever generate 'QzNWwc==' or any other base64 value for 0x433356c1 other than 'QzNWwQ==' since added padding bytes should always be zeros.

In conclusion, it is safe to assume that a unique binary/hex value will always encode to a unique base64 representation using correctly implemented base64 encoders. A 'collision' will only occur during decoding if base64 strings are generated without zeroing padding/alignment bytes.

Share:
19,602
user2924127
Author by

user2924127

Updated on June 02, 2022

Comments

  • user2924127
    user2924127 almost 2 years

    I can't find an answer to this. If I encode a string with Base64 will the encoded output be unique based on the string? I ask because I want to create a token which will contain user information so I need make sure the output will be unique depending on the information.

    For example if I encode "UnqUserId:987654321 Timestamp:01/02/03" will this be unique so no matter what other userid I put it in there will never be a collision?