Base62 hash of a string

12,159

Base 64 is widely used to encode binary data, because 6 bits exactly fit one character, but there are still enough printable ASCII characters to represent all of the possible patterns. In other words, the 64 available characters represent all 64 different bit patterns from decimal 0 up to decimal 63.

There are several problems with encoding binary data as base 62, based on the fact that an alphabet of size 62 just isn't a good fit. You could just map the binary data from the digest algorithm into 32-bit chunks and then assign each of these 5-bit chunks to a character. However that means that the characters above "v" will not be used any more, so you would essentially end up with a base 32 encoding.

In terms of efficiency, base 62 will never even come close to base64. Base 64 encoding is dead simple: Take 6 bits, map them onto a character, repeat until done. This is so simple, because 64 is a power of 2. With base 62 however, you will have to convert to an integer and start carrying over the "remainder" with each step, because the patterns do not fit evenly.

So my advice, which you may not like, is to use a different encoding.

--

If you need a url safe encoding you can for example use one of these:

# sample string
str = 'foo'

# original base 64 method for comparison
Digest::SHA256.base64digest(str)
#=> "LCa0a2j/xo/5m0U8HTBBNBNCLXBkg7+g+YpeiGJm564="

# url safe variant (no slash or plus characters)
Base64.urlsafe_encode64(Digest::SHA256.digest(str))
#=> "LCa0a2j_xo_5m0U8HTBBNBNCLXBkg7-g-YpeiGJm564="

# hexadecimal (base 16)
Digest::SHA256.hexdigest(str)
#=> "2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae"

# or base 32
# gem install base32
require 'base32'
Base32.encode(Digest::SHA256.digest(str))
#=> "FQTLI23I77DI76M3IU6B2MCBGQJUELLQMSB37IHZRJPIQYTG46XA===="

# or with direct url encoding
# not pretty, but url safe!
require 'open-uri'
URI::encode(Digest::SHA256.digest(str))
#=> ",&%B4kh%FF%C6%8F%F9%9BE%3C%1D0A4%13B-pd%83%BF%A0%F9%8A%5E%88bf%E7%AE"

# or url url escaped base 64
# not pretty, but url safe!
require 'cgi'
CGI::escape(Digest::SHA256.base64digest(str))
#=> "LCa0a2j%2Fxo%2F5m0U8HTBBNBNCLXBkg7%2Bg%2BYpeiGJm564%3D"

--

Edit: and here's a very very very inefficient implementation of base62 ;-)

# gem install base62
require 'base62'

def pack_int(str)
  str.unpack('C*').each_with_index.reduce(0){|r,(x,i)| r + (x << 8*i) }
end

def unpack_int(int)
  n = (Math.log2(int)/8).ceil
  n.times.map{|i| (int >> 8*i) & 255 }.pack('C*')
end

def base62_encode(str)
  Base62.encode(pack_int(str))
end

def base62_decode(encoded)
  unpack_int(Base62.decode(encoded))
end

str = "foo"

# encode
digest = Digest::SHA256.digest(str)
fingerprint = base62_encode(digest)
#=> "fTSIMrZT3fDTvW7XDBq1b7nhWa24Zl55EVpsaO3TBBE"

# decode
recovered_digest = base62_decode(fingerprint)
#=> ",&\xB4kh\xFF\xC6\x8F\xF9\x9BE<\x1D0A4\x13B-pd\x83\xBF\xA0\xF9\x8A^\x88bf\xE7\xAE"

digest == recovered_digest
#=> true
Share:
12,159
mahemoff
Author by

mahemoff

Home http://mahemoff.com GitHub https://github.com/mahemoff Blog http://softwareas.com Twitter @mahemoff LinkedIn Mahemoff

Updated on June 13, 2022

Comments

  • mahemoff
    mahemoff almost 2 years

    I want do something like fingerprint = Digest::SHA256.base64digest(str) but for base62 instead of base64. How can I efficiently build a unique base62-encoded string hash of any string?