Shortest hash? MD5 / SHA . First chars, git

10,889

Solution 1

Check out hashids, which is designed to generate unique YouTube-style hashes from your primary keys (or some other set of unique numbers). It's not really a hash in the sense that MD5 and SHA-1 are, as it's designed to be reversible.

As an example, if you want to "hash" your single integer primary key, you may get a relationship like

(PK: 1) <=> (hashid: 8dY0qQ)

This is seeded from a secret value that you control, so users are not able to determine the primary key they're really referencing. If your database is somewhat more involved, say with multiple shards and complex keys, you're still okay. hashids takes a list of integers as input:

(3, 171, 24) <=> (243j7Z)

As the developer, you are responsible for defining the hash's minimum length. As you generate more and more hashes, hashids may generate slightly longer hashes.

Hashes are guaranteed to be unique for a given input (initial seed, minimum hash length, and list of integers to hash):

There are no collisions. Your generated hashes should be unique.

There is support for

  • JavaScript
  • Ruby
  • Python
  • Java
  • PHP
  • Perl
  • CoffeeScript
  • Objective-C
  • Go
  • Lua
  • Node.js
  • .NET

Solution 2

By default git only shows 7 characters as the odds are it will be unique, and you can refer to commits/blobs using just enough characters to define it as unique.

However, under the hood it still using the full hash. If your git tree has two commits with the same first 7 numbers, then it will throw an error if you only use 7 characters to identify one of those commits.

If the user is entering the hash for data that the system already knows about, then allow the user to enter as many characters as he thinks he needs, and if that isn't enough to uniquely identify which hash he is talking about, then error and prompt for more.

7 hex characters gives ~ 2x10^7 possible hashes. Assuming you are using a good hash - i.e. it has an even spread across the values then by square approximation you have a 50% chance of a duplicate after ~19k* hashes. Whether this is acceptable to you depends on how many you are inserting

*The number of inserts to get a 50% chance of a hash collision for a hash of N hex characters is approximately 0.5+sqrt(0.25-(2xln(0.5)x16^N))

Share:
10,889
martin
Author by

martin

Updated on July 09, 2022

Comments

  • martin
    martin almost 2 years

    I need hash function. Users will be write these hash to computer so hash should be short. I will have about 50 000 000 records in database. Each must have own hash. I would like to have unique hashes. But if a little records have the same hash, I can accept. Unique is better.

    MD2 is good secure for me, but hash is long: "8350e5a3e24c153df2275c9f80692773" - 32 chars. If you must write 10 MD2 hash on keybord you do not happy...

    Git use SHA1 for each commit (40 chars). But in output show only first 7 chars:

    $ git log
    commit e2cfc89fae5b43594b2c649fd4c05bcc6d2d12ac
    ...
    commit 56a8b4c50d4269dc3f88727472933fd81231f63b
    ...
    commit ce2e9ddbe896b9592abbd5fcb6604b181809d523
    ...
    commit 498c49833516ea33b6a40697634ea6e3cfd62328
    ...
    commit b7d78aea415e64d8d441f9747fe6d5d48fe54ee5
    
    $ git log --oneline | head -n 5
    e2cfc89 commnit message...
    56a8b4c commnit message...
    ce2e9dd commnit message...
    498c498 commnit message...
    b7d78ae commnit message...
    

    How is it secure/unique? If I will use for example first 5 or 10 chars from MD5/SHA-1/SHA-256 is it enough secure?

    Thank you.