PHP short hash like URL-shortening websites

100,859

Solution 1

URL shortening services rather use a auto incremented integer value (like a supplementary database ID) and encode that with Base64 or other encodings to have more information per character (64 instead of just 10 like digits).

Solution 2

TinyURL doesn't hash anything, it uses Base 36 integers (or even base 62, using lower and uppercase letters) to indicate which record to visit.

Base 36 to Integer:

intval($str, 36);

Integer to Base 36:

base_convert($val, 10, 36);

So then, instead of redirecting to a route like /url/1234 it becomes /url/ax instead. This gives you a whole lot more use than a hash will, as there will be no collisions. With this you can easily check if a url exists and return the proper, existing, ID in base 36 without the user knowing that it was already in the database.

Don't hash, use other bases for this kind of thing. (It's faster and can be made collision-proof.)

Solution 3

I wrote a tiny lib to generate obfuscated hashes from integers.

http://web.archive.org/web/20130727034425/http://blog.kevburnsjr.com/php-unique-hash

$ids = range(1,10);
foreach($ids as $id) {
  echo PseudoCrypt::unhash($id) . "\n";
}
m8z2p
8hy5e
uqx83
gzwas
38vdh
phug6
bqtiv
xzslk
k8ro9
6hqqy

7/14/2015: Adding the actual code below, since it has become difficult to find:

<?php
/**
 * PseudoCrypt by KevBurns (http://blog.kevburnsjr.com/php-unique-hash)
 * Reference/source: http://stackoverflow.com/a/1464155/933782
 * 
 * I want a short alphanumeric hash that’s unique and who’s sequence is difficult to deduce. 
 * I could run it out to md5 and trim the first n chars but that’s not going to be very unique. 
 * Storing a truncated checksum in a unique field means that the frequency of collisions will increase 
 * geometrically as the number of unique keys for a base 62 encoded integer approaches 62^n. 
 * I’d rather do it right than code myself a timebomb. So I came up with this.
 * 
 * Sample Code:
 * 
 * echo "<pre>";
 * foreach(range(1, 10) as $n) {
 *     echo $n." - ";
 *     $hash = PseudoCrypt::hash($n, 6);
 *     echo $hash." - ";
 *     echo PseudoCrypt::unhash($hash)."<br/>";
 * }
 * 
 * Sample Results:
 * 1 - cJinsP - 1
 * 2 - EdRbko - 2
 * 3 - qxAPdD - 3
 * 4 - TGtDVc - 4
 * 5 - 5ac1O1 - 5
 * 6 - huKpGQ - 6
 * 7 - KE3d8p - 7
 * 8 - wXmR1E - 8
 * 9 - YrVEtd - 9
 * 10 - BBE2m2 - 10
 */

class PseudoCrypt {

    /* Key: Next prime greater than 62 ^ n / 1.618033988749894848 */
    /* Value: modular multiplicative inverse */
    private static $golden_primes = array(
        '1'                  => '1',
        '41'                 => '59',
        '2377'               => '1677',
        '147299'             => '187507',
        '9132313'            => '5952585',
        '566201239'          => '643566407',
        '35104476161'        => '22071637057',
        '2176477521929'      => '294289236153',
        '134941606358731'    => '88879354792675',
        '8366379594239857'   => '7275288500431249',
        '518715534842869223' => '280042546585394647'
    );

    /* Ascii :                    0  9,         A  Z,         a  z     */
    /* $chars = array_merge(range(48,57), range(65,90), range(97,122)) */
    private static $chars62 = array(
        0=>48,1=>49,2=>50,3=>51,4=>52,5=>53,6=>54,7=>55,8=>56,9=>57,10=>65,
        11=>66,12=>67,13=>68,14=>69,15=>70,16=>71,17=>72,18=>73,19=>74,20=>75,
        21=>76,22=>77,23=>78,24=>79,25=>80,26=>81,27=>82,28=>83,29=>84,30=>85,
        31=>86,32=>87,33=>88,34=>89,35=>90,36=>97,37=>98,38=>99,39=>100,40=>101,
        41=>102,42=>103,43=>104,44=>105,45=>106,46=>107,47=>108,48=>109,49=>110,
        50=>111,51=>112,52=>113,53=>114,54=>115,55=>116,56=>117,57=>118,58=>119,
        59=>120,60=>121,61=>122
    );

    public static function base62($int) {
        $key = "";
        while(bccomp($int, 0) > 0) {
            $mod = bcmod($int, 62);
            $key .= chr(self::$chars62[$mod]);
            $int = bcdiv($int, 62);
        }
        return strrev($key);
    }

    public static function hash($num, $len = 5) {
        $ceil = bcpow(62, $len);
        $primes = array_keys(self::$golden_primes);
        $prime = $primes[$len];
        $dec = bcmod(bcmul($num, $prime), $ceil);
        $hash = self::base62($dec);
        return str_pad($hash, $len, "0", STR_PAD_LEFT);
    }

    public static function unbase62($key) {
        $int = 0;
        foreach(str_split(strrev($key)) as $i => $char) {
            $dec = array_search(ord($char), self::$chars62);
            $int = bcadd(bcmul($dec, bcpow(62, $i)), $int);
        }
        return $int;
    }

    public static function unhash($hash) {
        $len = strlen($hash);
        $ceil = bcpow(62, $len);
        $mmiprimes = array_values(self::$golden_primes);
        $mmi = $mmiprimes[$len];
        $num = self::unbase62($hash);
        $dec = bcmod(bcmul($num, $mmi), $ceil);
        return $dec;
    }

}

Solution 4

Shortest hash is 32 character length, how ever you can use first 8 characters of md5 hash

echo substr(md5('http://www.google.com'), 0, 8);

Update: here is another class found here written by Travell Perkins which takes record number and create short hash for it. 14 digits number produce 8 digit string. By the date you reach this number you become more popular than tinyurl ;)

class BaseIntEncoder {

    //const $codeset = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    //readable character set excluded (0,O,1,l)
    const codeset = "23456789abcdefghijkmnopqrstuvwxyzABCDEFGHIJKLMNPQRSTUVWXYZ";

    static function encode($n){
        $base = strlen(self::codeset);
        $converted = '';

        while ($n > 0) {
            $converted = substr(self::codeset, bcmod($n,$base), 1) . $converted;
            $n = self::bcFloor(bcdiv($n, $base));
        }

        return $converted ;
    }

    static function decode($code){
        $base = strlen(self::codeset);
        $c = '0';
        for ($i = strlen($code); $i; $i--) {
            $c = bcadd($c,bcmul(strpos(self::codeset, substr($code, (-1 * ( $i - strlen($code) )),1))
                    ,bcpow($base,$i-1)));
        }

        return bcmul($c, 1, 0);
    }

    static private function bcFloor($x)
    {
        return bcmul($x, '1', 0);
    }

    static private function bcCeil($x)
    {
        $floor = bcFloor($x);
        return bcadd($floor, ceil(bcsub($x, $floor)));
    }

    static private function bcRound($x)
    {
        $floor = bcFloor($x);
        return bcadd($floor, round(bcsub($x, $floor)));
    }
}

here is example how to use it:

BaseIntEncoder::encode('1122344523');//result:3IcjVE
BaseIntEncoder::decode('3IcjVE');//result:1122344523

Solution 5

For a short hash, url friendly, in the view of disallowing possible duplicate contents, we can use hash() and especially the CRC or Adler-32 type, since they are made exactly for that:

Cyclic Redundancy Check

A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data. Blocks of data entering these systems get a short check value attached, based on the remainder of a polynomial division of their contents. On retrieval, the calculation is repeated and, in the event the check values do not match, corrective action can be taken https://en.wikipedia.org/wiki/Cyclic_redundancy_check

Adler-32 is a checksum algorithm (...), compared to a Cyclic Redundancy Check of the same length, it trades reliability for speed (preferring the latter) https://en.wikipedia.org/wiki/Adler-32

echo hash("crc32", "Content of article...");
// Output fd3e7c6e
echo hash("adler32", "Content of article...");
// Output 55df075f

[Youtube] How do CRCs work?

Share:
100,859

Related videos on Youtube

clamp
Author by

clamp

hello

Updated on August 10, 2021

Comments

  • clamp
    clamp almost 3 years

    I am looking for a PHP function that creates a short hash out of a string or a file, similar to those URL-shortening websites like tinyurl.com

    The hash should not be longer than 8 characters.

    • Greg
      Greg over 9 years
      I know this is an old question but check out: hashids.org. Works with most programming languages
    • Anis
      Anis over 8 years
      Check out ShortCode library. It does exactly what you want. Based on base conversion.
    • caw
      caw over 4 years
      Other than using Adler-32 or CRC32, you can’t shorten modern (collision-resistant) hashes that much (i.e. down to 8 characters). Not with SHA-2, not with SHA-1, and not even with MD5. With Alphabet::convert($hash, Alphabet::HEX, Alphabet::ALPHANUMERIC), you can get MD5 down to 22 (from 32) characters. What you want is instead encode the files’ integer IDs (e.g. from your database) with (new Id())->encode($id).
  • Tom Haigh
    Tom Haigh about 15 years
    Using the first 8 charactors of md5 there is probably a reasonable chance of two URLs having the same hash
  • Nazariy
    Nazariy about 15 years
    Yes such collision might occur, but chance is very little for random string it is about one to 4 billions, how ever if you want to have 100% unique hash which you can use as reference to database record use included class.
  • sova
    sova almost 13 years
    this has a very intelligent design =D golden primes = world.rock()
  • tim peterson
    tim peterson almost 12 years
    hi @RobertK, how would the PHP look for converting 6 digit strings that include both numbers and letters?
  • Robert K
    Robert K almost 12 years
    @timpeterson, simply call intval and pass in the given base (see my first code block).
  • tim peterson
    tim peterson almost 12 years
    @RobertK, but intval() turns everything into a number. I guess maybe I'm confused on how intval() connects with the other steps required to do the redirect like the role of the database.
  • Robert K
    Robert K almost 12 years
    @timpeterson, that's because the string represents the ID of the database record. So you select the record based on the passed ID.
  • tim peterson
    tim peterson almost 12 years
    @RobertK, the one problem I'm seeing with intval() is if your $str has slashes(/) or dashes(-) in it. I realized that on/stuff, on-stuff, and on all returned the number 887. Do you have a solution in mind to work with URLs with slashes and dashes in them?
  • Robert K
    Robert K almost 12 years
    @timpeterson Try base64_decode and you should be good to go. Base 36 will only contain 0-9 and a-z; base 62 would contain 0-9 and a-z and A-Z.
  • DanielJay
    DanielJay almost 12 years
    I know I am commenting on an older post. I thought I would mention that KevBurnsJr code does work well. However, I recently just switched from a Windows 2003 32 bit server to a Windows 2008 R2 x64 server and I am finding that I am duplicating the unique hash's. I am now having to find an alternative method of creating confirmation codes.
  • KevBurnsJr
    KevBurnsJr over 11 years
    The post has been updated to use bcmath with the help of some commenters so it should be solid now. Someone also found a way to make it reversible which is totally dope.
  • Luis Siquot
    Luis Siquot almost 11 years
    want to mention that const codeset may be in any arbitrary order, just to obfuscate++
  • Harinder
    Harinder about 10 years
    web.archive.org/web/20130727034425/http://blog.kevburnsjr.co‌​m/… it seem website is down so here is the copy of that link ;)
  • KevBurnsJr
    KevBurnsJr about 10 years
    ya my webserver got, uh, unplugged.
  • inorganik
    inorganik almost 10 years
    You ought to get your site back up, or post the php version of this under github.com/KevBurnsJr/pseudocrypt - what a great little lib! Didn't want to use a giant "system" like YOURLS or PHURL, just a nice lib to create shortlinks and this is it. Thanks
  • Ravi Soni
    Ravi Soni over 9 years
    What this mean (more information per character) just curious!!
  • Gumbo
    Gumbo over 9 years
    @ravisoni If you use the decimal digits 09 to represent a number, you have 10 possible values per encoded character (ld(10) ≈ 3.32 bits/character). However, if you represent the same number with Base64 characters, you have 64 possible values per encoded character (ld(64) = 6 bits/character). So with Base64 there is more information stored in each encoded character, i. e., 6 bits of information instead of 3.32 bits.
  • ethanpil
    ethanpil almost 9 years
    Its missing again. After some detective work, I was able to resurrect it here: gist.github.com/ethanpil/94455f8de76671aa9024
  • Rajesh
    Rajesh almost 9 years
    Anyone know how the modular multiplicative inverse of 2377 is 1677? I get 3 as value.
  • Rajesh
    Rajesh almost 9 years
    Found the solution to calculate modular multiplicative inverse in this reddit link: reddit.com/r/PHPhelp/comments/1gztrp/…
  • divide_by_zero
    divide_by_zero almost 9 years
    Anyone understand why @KevBurnsJr uses the next prime > 62^n / golden ratio? Why not just choose the next prime > 62^n? I understand why choosing a small prime might be problematic but why not just any arbitrary large prime?
  • Akhil Balakrishnan
    Akhil Balakrishnan over 8 years
    Please note that the result of intval will be the maximum signed integer value if the input is very long.
  • Admin
    Admin about 8 years
    If you use base64, then nothing stops a script from saying for($i=0;$i<999999;$i++) { $pageContent=fread(fopen('yoururl.com/'.base64_encode($i)); } and now I have access to every single URL in your database.
  • Thomas
    Thomas over 6 years
    I'm loving this little micro library thanks @KevBurnsJr
  • j4k3
    j4k3 over 6 years
    Is there a way to salt generated hashes with a seed? Because in this state, the intended use of sequential codes that are "difficult to deduce" isn't really given.
  • user151496
    user151496 over 6 years
    very clever implementation!
  • user151496
    user151496 over 6 years
    @j4k3 if you want to randomly "salt" the results, then shuffle the $chars62 array from elements 1 to 61, and make your own unique number->hash bijection. BUT MAKE SURE YOU KEEP THE FIRST 0=>42 OF THE ARRAY INTACT. otherwise every hash that starts with 0 will give you bad results at unhashing
  • rabudde
    rabudde almost 6 years
    Is this older post still up-to-date? And is it really collision free for integers up to BIGINT_MAX?