PHP short hash like URL-shortening websites
Solution 1
URL shortening services rather use a auto incremented integer value (like a supplementary database ID) and encode that with Base64 or other encodings to have more information per character (64 instead of just 10 like digits).
Solution 2
TinyURL doesn't hash anything, it uses Base 36 integers (or even base 62, using lower and uppercase letters) to indicate which record to visit.
Base 36 to Integer:
intval($str, 36);
Integer to Base 36:
base_convert($val, 10, 36);
So then, instead of redirecting to a route like /url/1234
it becomes /url/ax
instead. This gives you a whole lot more use than a hash will, as there will be no collisions. With this you can easily check if a url exists and return the proper, existing, ID in base 36 without the user knowing that it was already in the database.
Don't hash, use other bases for this kind of thing. (It's faster and can be made collision-proof.)
Solution 3
I wrote a tiny lib to generate obfuscated hashes from integers.
http://web.archive.org/web/20130727034425/http://blog.kevburnsjr.com/php-unique-hash
$ids = range(1,10);
foreach($ids as $id) {
echo PseudoCrypt::unhash($id) . "\n";
}
m8z2p 8hy5e uqx83 gzwas 38vdh phug6 bqtiv xzslk k8ro9 6hqqy
7/14/2015: Adding the actual code below, since it has become difficult to find:
<?php
/**
* PseudoCrypt by KevBurns (http://blog.kevburnsjr.com/php-unique-hash)
* Reference/source: http://stackoverflow.com/a/1464155/933782
*
* I want a short alphanumeric hash that’s unique and who’s sequence is difficult to deduce.
* I could run it out to md5 and trim the first n chars but that’s not going to be very unique.
* Storing a truncated checksum in a unique field means that the frequency of collisions will increase
* geometrically as the number of unique keys for a base 62 encoded integer approaches 62^n.
* I’d rather do it right than code myself a timebomb. So I came up with this.
*
* Sample Code:
*
* echo "<pre>";
* foreach(range(1, 10) as $n) {
* echo $n." - ";
* $hash = PseudoCrypt::hash($n, 6);
* echo $hash." - ";
* echo PseudoCrypt::unhash($hash)."<br/>";
* }
*
* Sample Results:
* 1 - cJinsP - 1
* 2 - EdRbko - 2
* 3 - qxAPdD - 3
* 4 - TGtDVc - 4
* 5 - 5ac1O1 - 5
* 6 - huKpGQ - 6
* 7 - KE3d8p - 7
* 8 - wXmR1E - 8
* 9 - YrVEtd - 9
* 10 - BBE2m2 - 10
*/
class PseudoCrypt {
/* Key: Next prime greater than 62 ^ n / 1.618033988749894848 */
/* Value: modular multiplicative inverse */
private static $golden_primes = array(
'1' => '1',
'41' => '59',
'2377' => '1677',
'147299' => '187507',
'9132313' => '5952585',
'566201239' => '643566407',
'35104476161' => '22071637057',
'2176477521929' => '294289236153',
'134941606358731' => '88879354792675',
'8366379594239857' => '7275288500431249',
'518715534842869223' => '280042546585394647'
);
/* Ascii : 0 9, A Z, a z */
/* $chars = array_merge(range(48,57), range(65,90), range(97,122)) */
private static $chars62 = array(
0=>48,1=>49,2=>50,3=>51,4=>52,5=>53,6=>54,7=>55,8=>56,9=>57,10=>65,
11=>66,12=>67,13=>68,14=>69,15=>70,16=>71,17=>72,18=>73,19=>74,20=>75,
21=>76,22=>77,23=>78,24=>79,25=>80,26=>81,27=>82,28=>83,29=>84,30=>85,
31=>86,32=>87,33=>88,34=>89,35=>90,36=>97,37=>98,38=>99,39=>100,40=>101,
41=>102,42=>103,43=>104,44=>105,45=>106,46=>107,47=>108,48=>109,49=>110,
50=>111,51=>112,52=>113,53=>114,54=>115,55=>116,56=>117,57=>118,58=>119,
59=>120,60=>121,61=>122
);
public static function base62($int) {
$key = "";
while(bccomp($int, 0) > 0) {
$mod = bcmod($int, 62);
$key .= chr(self::$chars62[$mod]);
$int = bcdiv($int, 62);
}
return strrev($key);
}
public static function hash($num, $len = 5) {
$ceil = bcpow(62, $len);
$primes = array_keys(self::$golden_primes);
$prime = $primes[$len];
$dec = bcmod(bcmul($num, $prime), $ceil);
$hash = self::base62($dec);
return str_pad($hash, $len, "0", STR_PAD_LEFT);
}
public static function unbase62($key) {
$int = 0;
foreach(str_split(strrev($key)) as $i => $char) {
$dec = array_search(ord($char), self::$chars62);
$int = bcadd(bcmul($dec, bcpow(62, $i)), $int);
}
return $int;
}
public static function unhash($hash) {
$len = strlen($hash);
$ceil = bcpow(62, $len);
$mmiprimes = array_values(self::$golden_primes);
$mmi = $mmiprimes[$len];
$num = self::unbase62($hash);
$dec = bcmod(bcmul($num, $mmi), $ceil);
return $dec;
}
}
Solution 4
Shortest hash is 32 character length, how ever you can use first 8 characters of md5 hash
echo substr(md5('http://www.google.com'), 0, 8);
Update: here is another class found here written by Travell Perkins which takes record number and create short hash for it. 14 digits number produce 8 digit string. By the date you reach this number you become more popular than tinyurl ;)
class BaseIntEncoder {
//const $codeset = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
//readable character set excluded (0,O,1,l)
const codeset = "23456789abcdefghijkmnopqrstuvwxyzABCDEFGHIJKLMNPQRSTUVWXYZ";
static function encode($n){
$base = strlen(self::codeset);
$converted = '';
while ($n > 0) {
$converted = substr(self::codeset, bcmod($n,$base), 1) . $converted;
$n = self::bcFloor(bcdiv($n, $base));
}
return $converted ;
}
static function decode($code){
$base = strlen(self::codeset);
$c = '0';
for ($i = strlen($code); $i; $i--) {
$c = bcadd($c,bcmul(strpos(self::codeset, substr($code, (-1 * ( $i - strlen($code) )),1))
,bcpow($base,$i-1)));
}
return bcmul($c, 1, 0);
}
static private function bcFloor($x)
{
return bcmul($x, '1', 0);
}
static private function bcCeil($x)
{
$floor = bcFloor($x);
return bcadd($floor, ceil(bcsub($x, $floor)));
}
static private function bcRound($x)
{
$floor = bcFloor($x);
return bcadd($floor, round(bcsub($x, $floor)));
}
}
here is example how to use it:
BaseIntEncoder::encode('1122344523');//result:3IcjVE
BaseIntEncoder::decode('3IcjVE');//result:1122344523
Solution 5
For a short hash, url friendly, in the view of disallowing possible duplicate contents, we can use hash()
and especially the CRC or Adler-32 type, since they are made exactly for that:
Cyclic Redundancy Check
A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data. Blocks of data entering these systems get a short check value attached, based on the remainder of a polynomial division of their contents. On retrieval, the calculation is repeated and, in the event the check values do not match, corrective action can be taken https://en.wikipedia.org/wiki/Cyclic_redundancy_check
Adler-32 is a checksum algorithm (...), compared to a Cyclic Redundancy Check of the same length, it trades reliability for speed (preferring the latter) https://en.wikipedia.org/wiki/Adler-32
echo hash("crc32", "Content of article...");
// Output fd3e7c6e
echo hash("adler32", "Content of article...");
// Output 55df075f
Related videos on Youtube
Comments
-
clamp almost 3 years
I am looking for a PHP function that creates a short hash out of a string or a file, similar to those URL-shortening websites like tinyurl.com
The hash should not be longer than 8 characters.
-
Greg over 9 yearsI know this is an old question but check out: hashids.org. Works with most programming languages
-
Anis over 8 yearsCheck out ShortCode library. It does exactly what you want. Based on base conversion.
-
caw over 4 yearsOther than using Adler-32 or CRC32, you can’t shorten modern (collision-resistant) hashes that much (i.e. down to 8 characters). Not with SHA-2, not with SHA-1, and not even with MD5. With
Alphabet::convert($hash, Alphabet::HEX, Alphabet::ALPHANUMERIC)
, you can get MD5 down to 22 (from 32) characters. What you want is instead encode the files’ integer IDs (e.g. from your database) with(new Id())->encode($id)
.
-
-
Tom Haigh about 15 yearsUsing the first 8 charactors of md5 there is probably a reasonable chance of two URLs having the same hash
-
Nazariy about 15 yearsYes such collision might occur, but chance is very little for random string it is about one to 4 billions, how ever if you want to have 100% unique hash which you can use as reference to database record use included class.
-
sova almost 13 yearsthis has a very intelligent design =D golden primes = world.rock()
-
tim peterson almost 12 yearshi @RobertK, how would the PHP look for converting 6 digit strings that include both numbers and letters?
-
Robert K almost 12 years@timpeterson, simply call intval and pass in the given base (see my first code block).
-
tim peterson almost 12 years@RobertK, but
intval()
turns everything into a number. I guess maybe I'm confused on howintval()
connects with the other steps required to do the redirect like the role of the database. -
Robert K almost 12 years@timpeterson, that's because the string represents the ID of the database record. So you select the record based on the passed ID.
-
tim peterson almost 12 years@RobertK, the one problem I'm seeing with
intval()
is if your$str
has slashes(/) or dashes(-) in it. I realized thaton/stuff
,on-stuff
, andon
all returned the number887
. Do you have a solution in mind to work with URLs with slashes and dashes in them? -
Robert K almost 12 years@timpeterson Try base64_decode and you should be good to go. Base 36 will only contain 0-9 and a-z; base 62 would contain 0-9 and a-z and A-Z.
-
DanielJay almost 12 yearsI know I am commenting on an older post. I thought I would mention that KevBurnsJr code does work well. However, I recently just switched from a Windows 2003 32 bit server to a Windows 2008 R2 x64 server and I am finding that I am duplicating the unique hash's. I am now having to find an alternative method of creating confirmation codes.
-
KevBurnsJr over 11 yearsThe post has been updated to use bcmath with the help of some commenters so it should be solid now. Someone also found a way to make it reversible which is totally dope.
-
Luis Siquot almost 11 yearswant to mention that
const codeset
may be in any arbitrary order, just to obfuscate++ -
Harinder about 10 yearsweb.archive.org/web/20130727034425/http://blog.kevburnsjr.com/… it seem website is down so here is the copy of that link ;)
-
KevBurnsJr about 10 yearsya my webserver got, uh, unplugged.
-
inorganik almost 10 yearsYou ought to get your site back up, or post the php version of this under github.com/KevBurnsJr/pseudocrypt - what a great little lib! Didn't want to use a giant "system" like YOURLS or PHURL, just a nice lib to create shortlinks and this is it. Thanks
-
Ravi Soni over 9 yearsWhat this mean (more information per character) just curious!!
-
Gumbo over 9 years@ravisoni If you use the decimal digits
0
–9
to represent a number, you have 10 possible values per encoded character (ld(10) ≈ 3.32 bits/character). However, if you represent the same number with Base64 characters, you have 64 possible values per encoded character (ld(64) = 6 bits/character). So with Base64 there is more information stored in each encoded character, i. e., 6 bits of information instead of 3.32 bits. -
ethanpil almost 9 yearsIts missing again. After some detective work, I was able to resurrect it here: gist.github.com/ethanpil/94455f8de76671aa9024
-
Rajesh almost 9 yearsAnyone know how the modular multiplicative inverse of 2377 is 1677? I get 3 as value.
-
Rajesh almost 9 yearsFound the solution to calculate modular multiplicative inverse in this reddit link: reddit.com/r/PHPhelp/comments/1gztrp/…
-
divide_by_zero almost 9 yearsAnyone understand why @KevBurnsJr uses the next prime > 62^n / golden ratio? Why not just choose the next prime > 62^n? I understand why choosing a small prime might be problematic but why not just any arbitrary large prime?
-
Akhil Balakrishnan over 8 yearsPlease note that the result of
intval
will be the maximum signed integer value if the input is very long. -
Admin about 8 yearsIf you use base64, then nothing stops a script from saying for($i=0;$i<999999;$i++) { $pageContent=fread(fopen('yoururl.com/'.base64_encode($i)); } and now I have access to every single URL in your database.
-
Thomas over 6 yearsI'm loving this little micro library thanks @KevBurnsJr
-
j4k3 over 6 yearsIs there a way to salt generated hashes with a seed? Because in this state, the intended use of sequential codes that are "difficult to deduce" isn't really given.
-
user151496 over 6 yearsvery clever implementation!
-
user151496 over 6 years@j4k3 if you want to randomly "salt" the results, then shuffle the $chars62 array from elements 1 to 61, and make your own unique number->hash bijection. BUT MAKE SURE YOU KEEP THE FIRST 0=>42 OF THE ARRAY INTACT. otherwise every hash that starts with 0 will give you bad results at unhashing
-
rabudde almost 6 yearsIs this older post still up-to-date? And is it really collision free for integers up to BIGINT_MAX?