Shortest possible encoded string with a decode possibility (shorten URL) using only PHP

35,659

Solution 1

I suspect that you will need to think more about your method of hashing if you don't want it to be decodable by the user. The issue with Base64 is that a Base64 string looks like a base64 string. There's a good chance that someone that's savvy enough to be looking at your page source will probably recognise it too.

Part one:

a method that encodes an string to shortest possible length

If you're flexible on your URL vocabulary/characters, this will be a good starting place. Since gzip makes a lot of its gains using back references, there is little point as the string is so short.

Consider your example - you've only saved 2 bytes in the compression, which are lost again in Base64 padding:

Non-gzipped: string(52) "aW1nPS9kaXIvZGlyL2hpLXJlcy1pbWcuanBnJnc9NzAwJmg9NTAw"

Gzipped: string(52) "y8xNt9VPySwC44xM3aLUYt3M3HS9rIJ0tXJbcwMDtQxbUwMDAA=="

If you reduce your vocabulary size, this will naturally allow you better compression. Let's say we remove some redundant information.

Take a look at the functions:

function compress($input, $ascii_offset = 38){
    $input = strtoupper($input);
    $output = '';
    //We can try for a 4:3 (8:6) compression (roughly), 24 bits for 4 characters
    foreach(str_split($input, 4) as $chunk) {
        $chunk = str_pad($chunk, 4, '=');

        $int_24 = 0;
        for($i=0; $i<4; $i++){
            //Shift the output to the left 6 bits
            $int_24 <<= 6;

            //Add the next 6 bits
            //Discard the leading ASCII chars, i.e make
            $int_24 |= (ord($chunk[$i]) - $ascii_offset) & 0b111111;
        }

        //Here we take the 4 sets of 6 apart in 3 sets of 8
        for($i=0; $i<3; $i++) {
            $output = pack('C', $int_24) . $output;
            $int_24 >>= 8;
        }
    }

    return $output;
}

And

function decompress($input, $ascii_offset = 38) {

    $output = '';
    foreach(str_split($input, 3) as $chunk) {

        //Reassemble the 24 bit ints from 3 bytes
        $int_24 = 0;
        foreach(unpack('C*', $chunk) as $char) {
            $int_24 <<= 8;
            $int_24 |= $char & 0b11111111;
        }

        //Expand the 24 bits to 4 sets of 6, and take their character values
        for($i = 0; $i < 4; $i++) {
            $output = chr($ascii_offset + ($int_24 & 0b111111)) . $output;
            $int_24 >>= 6;
        }
    }

    //Make lowercase again and trim off the padding.
    return strtolower(rtrim($output, '='));
}

It is basically a removal of redundant information, followed by the compression of 4 bytes into 3. This is achieved by effectively having a 6-bit subset of the ASCII table. This window is moved so that the offset starts at useful characters and includes all the characters you're currently using.

With the offset I've used, you can use anything from ASCII 38 to 102. This gives you a resulting string of 30 bytes, that's a 9-byte (24%) compression! Unfortunately, you'll need to make it URL-safe (probably with base64), which brings it back up to 40 bytes.

I think at this point, you're pretty safe to assume that you've reached the "security through obscurity" level required to stop 99.9% of people. Let's continue though, to the second part of your question

so the user can't guess how to get the larger image

It's arguable that this is already solved with the above, but you need to pass this through a secret on the server, preferably with PHP's OpenSSL interface. The following code shows the complete usage flow of functions above and the encryption:

$method = 'AES-256-CBC';
$secret = base64_decode('tvFD4Vl6Pu2CmqdKYOhIkEQ8ZO4XA4D8CLowBpLSCvA=');
$iv = base64_decode('AVoIW0Zs2YY2zFm5fazLfg==');

$input = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
var_dump($input);

$compressed = compress($input);
var_dump($compressed);

$encrypted = openssl_encrypt($compressed, $method, $secret, false, $iv);
var_dump($encrypted);

$decrypted = openssl_decrypt($encrypted, $method, $secret, false, $iv);
var_dump($decrypted);

$decompressed = decompress($compressed);
var_dump($decompressed);

The output of this script is the following:

string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"
string(30) "<��(��tJ��@�xH��G&(�%��%��xW"
string(44) "xozYGselci9i70cTdmpvWkrYvGN9AmA7djc5eOcFoAM="
string(30) "<��(��tJ��@�xH��G&(�%��%��xW"
string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"

You'll see the whole cycle: compression → encryption → Base64 encode/decode → decryption → decompression. The output of this would be as close as possible as you could really get, at near the shortest length you could get.

Everything aside, I feel obliged to conclude this with the fact that it is theoretical only, and this was a nice challenge to think about. There are definitely better ways to achieve your desired result - I'll be the first to admit that my solution is a little bit absurd!

Solution 2

Instead of encoding the URL, output a thumbnail copy of the original image. Here's what I'm thinking:

  1. Create a "map" for PHP by naming your pictures (the actual file names) using random characters. Random_bytes is a great place to start.

  2. Embed the desired resolution within the randomized URL string from #1.

  3. Use the imagecopyresampled function to copy the original image into the resolution you would like to output before outputting it out to the client's device.

So for example:

  1. Filename example (from bin2hex(random_bytes(6))): a1492fdbdcf2.jpg

  2. Resolution desired: 800x600. My new link could look like: http://myserver.com/?800a1492fdbdcf2600 or maybe http://myserfer.com/?a1492800fdbdc600f2 or maybe even http://myserver.com/?800a1492fdbdcf2=600 depending on where I choose to embed the resolution within the link

  3. PHP would know that the file name is a1492fdbdcf2.jpg, grab it, use the imagecopyresampled to copy to the resolution you want, and output it.

Solution 3

Theory

In theory we need a short input character set and a large output character set. I will demonstrate it by the following example. We have the number 2468 as integer with 10 characters (0-9) as character set. We can convert it to the same number with base 2 (binary number system). Then we have a shorter character set (0 and 1) and the result is longer: 100110100100

But if we convert to hexadecimal number (base 16) with a character set of 16 (0-9 and A-F). Then we get a shorter result: 9A4

Practice

So in your case we have the following character set for the input:

$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz=/-.&";

In total 41 characters: Numbers, lower cases and the special chars = / - . &

The character set for output is a bit tricky. We want use URL save characters only. I've grabbed them from here: Characters allowed in GET parameter

So our output character set is (73 characters):

$outputCharacterSet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~-_.!*'(),$";

Numbers, lower and upper cases and some special characters.

We have more characters in our set for the output than for the input. Theory says we can short our input string. Check!

Coding

Now we need an encode function from base 41 to base 73. For that case I don't know a PHP function. Luckily we can grab the function 'convBase' from here: Convert an arbitrarily large number from any base to any base

<?php
function convBase($numberInput, $fromBaseInput, $toBaseInput)
{
    if ($fromBaseInput == $toBaseInput) return $numberInput;
    $fromBase = str_split($fromBaseInput, 1);
    $toBase = str_split($toBaseInput, 1);
    $number = str_split($numberInput, 1);
    $fromLen = strlen($fromBaseInput);
    $toLen = strlen($toBaseInput);
    $numberLen = strlen($numberInput);
    $retval = '';
    if ($toBaseInput == '0123456789')
    {
        $retval = 0;
        for ($i = 1;$i <= $numberLen; $i++)
            $retval = bcadd($retval, bcmul(array_search($number[$i-1], $fromBase), bcpow($fromLen, $numberLen-$i)));
        return $retval;
    }
    if ($fromBaseInput != '0123456789')
        $base10 = convBase($numberInput, $fromBaseInput, '0123456789');
    else
        $base10 = $numberInput;
    if ($base10<strlen($toBaseInput))
        return $toBase[$base10];
    while($base10 != '0')
    {
        $retval = $toBase[bcmod($base10,$toLen)] . $retval;
        $base10 = bcdiv($base10, $toLen, 0);
    }
    return $retval;
}

Now we can shorten the URL. The final code is:

$input = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz=/-.&";
$outputCharacterSet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~-_.!*'(),$";
$encoded = convBase($input, $inputCharacterSet, $outputCharacterSet);
var_dump($encoded); // string(34) "BhnuhSTc7LGZv.h((Y.tG_IXIh8AR.$!t*"
$decoded = convBase($encoded, $outputCharacterSet, $inputCharacterSet);
var_dump($decoded); // string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"

The encoded string has only 34 characters.

Optimizations

You can optimize the count of characters by

  • reduce the length of input string. Do you really need the overhead of URL parameter syntax? Maybe you can format your string as follows:

$input = '/dir/dir/hi-res-img.jpg,700,500';

This reduces the input itself and the input character set. Your reduced input character set is then:

$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz/-.,";

Final output:

string(27) "E$AO.Y_JVIWMQ9BB_Xb3!Th*-Ut"

string(31) "/dir/dir/hi-res-img.jpg,700,500"

  • reducing the input character set ;-). Maybe you can exclude some more characters? You can encode the numbers to characters first. Then your input character set can be reduced by 10!

  • increase your output character set. So the given set by me is googled within two minutes. Maybe you can use more URL save characters.

Security

Heads up: There is no cryptographically logic in the code. So if somebody guesses the character sets, he/she can decode the string easily. But you can shuffle the character sets (once). Then it is a bit harder for the attacker, but not really safe. Maybe it’s enough for your use case anyway.

Solution 4

Reading from the previous answers and below comments, you need a solution to hide the real path of your image parser, giving it a fixed image width.

Step 1: http://www.example.com/tn/full/animals/images/lion.jpg

You can achieve a basic "thumbnailer" by taking profit of .htaccess

RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule tn/(full|small)/(.*) index.php?size=$1&img=$2 [QSA,L]

Your PHP file:

 $basedir = "/public/content/";
 $filename = realpath($basedir.$_GET["img"]);

 ## Check that file is in $basedir
 if ((!strncmp($filename, $basedir, strlen($basedir))
    ||(!file_exists($filename)) die("Bad file path");

 switch ($_GET["size"]) {
    case "full":
        $width = 700;
        $height = 500;
        ## You can also use getimagesize() to test if the image is landscape or portrait
    break;
    default:
        $width = 350;
        $height = 250;
    break;
 }
 ## Here is your old code for resizing images.
 ## Note that the "tn" directory can exist and store the actual reduced images

This lets you using the URL www.example.com/tn/full/animals/images/lion.jpg to view your reduced in size image.

This has the advantage for SEO to preserve the original file name.

Step 2: http://www.example.com/tn/full/lion.jpg

If you want a shorter URL, if the number of images you have is not too much, you can use the basename of the file (e.g., "lion.jpg") and recursively search. When there is a collision, use an index to identify which one you want (e.g., "1--lion.jpg")

function matching_files($filename, $base) {
    $directory_iterator = new RecursiveDirectoryIterator($base);
    $iterator       = new RecursiveIteratorIterator($directory_iterator);
    $regex_iterator = new RegexIterator($iterator, "#$filename\$#");
    $regex_iterator->setFlags(RegexIterator::USE_KEY);
    return array_map(create_function('$a', 'return $a->getpathName();'), iterator_to_array($regex_iterator, false));
}

function encode_name($filename) {
    $files = matching_files(basename($filename), realpath('public/content'));
    $tot = count($files);
    if (!$tot)
        return NULL;
    if ($tot == 1)
        return $filename;
    return "/tn/full/" . array_search(realpath($filename), $files) . "--" . basename($filename);
}

function decode_name($filename) {
    $i = 0;
    if (preg_match("#^([0-9]+)--(.*)#", $filename, $out)) {
        $i = $out[1];
        $filename = $out[2];
    }

    $files = matching_files($filename, realpath('public/content'));

    return $files ? $files[$i] : NULL;
}

echo $name = encode_name("gallery/animals/images/lion.jp‌​g").PHP_EOL;
 ## --> returns lion.jpg
 ## You can use with the above solution the URL http://www.example.com/tn/lion.jpg

 echo decode_name(basename($name)).PHP_EOL;
 ## -> returns the full path on disk to the image "lion.jpg"

Original post:

Basically, if you add some formatting in your example, your shortened URL is in fact longer:

img=/dir/dir/hi-res-img.jpg&w=700&h=500  // 39 characters

y8xNt9VPySwC44xM3aLUYt3M3HS9rIJ0tXJbcwMDtQxbUwMDAA // 50 characters

Using base64_encode will always result in longer strings. And gzcompress will require at less to store one occurrence of the different chars; this is not a good solution for small strings.

So doing nothing (or a simple str_rot13) is clearly the first option to consider if you want to shorten the result you had previously.

You can also use a simple character replacement method of your choice:

 $raw_query_string = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
 $from = "0123456789abcdefghijklmnopqrstuvwxyz&=/ABCDEFGHIJKLMNOPQRSTUVWXYZ";
 // The following line if the result of str_shuffle($from)
 $to = "0IQFwAKU1JT8BM5npNEdi/DvZmXuflPVYChyrL4R7xc&SoG3Hq6ks=e9jW2abtOzg";
 echo strtr($raw_query_string, $from, $to) . "\n";

 // Result: EDpL4MEu4MEu4NE-u5f-EDp.dmprYLU00rNLA00 // 39 characters

Reading from your comment, you really want "to prevent anyone to gets a high-resolution image".

The best way to achieve that is to generate a checksum with a private key.

Encode:

$secret = "ujoo4Dae";
$raw_query_string = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
$encoded_query_string = $raw_query_string . "&k=" . hash("crc32", $raw_query_string . $secret);

Result: img=/dir/dir/hi-res-img.jpg&w=700&h=500&k=2ae31804

Decode:

if (preg_match("#(.*)&k=([^=]*)$#", $encoded_query_string, $out)
    && (hash("crc32", $out[1].$secret) == $out[2])) {
    $decoded_query_string = $out[1];
}

This does not hide the original path, but this path has no reason to be public. Your "index.php" can output your image from the local directory once the key has been checked.

If you really want to shorten your original URL, you have to consider the acceptable characters in the original URL to be restricted. Many compression methods are based on the fact that you can use a full byte to store more than a character.

Solution 5

There are many ways to shorten URLs. You can look up how other services, like TinyURL, shorten their URLs. Here is a good article on hashes and shortening URLs: URL Shortening: Hashes In Practice

You can use the PHP function mhash() to apply hashes to strings.

And if you scroll down to "Available Hashes" on the mhash website, you can see what hashes you can use in the function (although I would check what PHP versions have which functions): mhash - Hash Library

Share:
35,659

Related videos on Youtube

Artur Filipiak
Author by

Artur Filipiak

My name is Troubleshooting, bugs and workarounds ;)

Updated on July 09, 2022

Comments

  • Artur Filipiak
    Artur Filipiak almost 2 years

    I'm looking for a method that encodes a string to the shortest possible length and lets it be decodable (pure PHP, no SQL). I have working script, but I'm unsatisfied with the length of the encoded string.

    Scenario

    Link to an image (it depends on the file resolution I want to show to the user):

    Encoded link (so the user can't guess how to get the larger image):

    So, basically I'd like to encode only the search query part of the URL:

    • img=/dir/dir/hi-res-img.jpg&w=700&h=500

    The method I use right now will encode the above query string to:

    • y8xNt9VPySwC44xM3aLUYt3M3HS9rIJ0tXJbcwMDtQxbUwMDAA

    The method I use is:

     $raw_query_string = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
    
     $encoded_query_string = base64_encode(gzdeflate($raw_query_string));
     $decoded_query_string = gzinflate(base64_decode($encoded_query_string));
    

    How do I shorten the encoded result and still have the possibility to decode it using only PHP?

    • PeeHaa
      PeeHaa over 9 years
      I will bite: why do you want to do this?
    • Marcin Orlowski
      Marcin Orlowski over 9 years
      looks like home-made "security by obscurity" thing. Do not go that way. It's pointless and it's also a dead end.
    • Artur Filipiak
      Artur Filipiak over 9 years
      PeeHaa, The whole idea (in this particular example) is to prevent anyone to gets a hi-res image (not to prevent it completely but just to minimize possibility). I know it could be done better, but I just want this simple "plug and play". I'm pretty sure that a regular user would not try to decode it. On the other side, I'm just curious to how short result I could encode an string (even for other purposes).
    • th3falc0n
      th3falc0n over 9 years
      why are you trying to prevent the user from getting a hi-res image?
    • Artur Filipiak
      Artur Filipiak over 9 years
      th3falc0n, because I'm a photographer. If an user would like to have hi-res (7360x4912px) image, he could buy it.
    • Mark Baker
      Mark Baker over 9 years
      If you want your users to purchase high-res images, then don't display them in web pages.... display a lower resolution image and/or watermark the images that you display
    • Artur Filipiak
      Artur Filipiak over 9 years
      Mark Baker, this is not a solution if you want your web page looks good. These days I could simply show 800px images, but after 2 years they will be unwatchable because screen resolutions getting higher and higher. Then, instead of reediting all images and reuploading them again, I could simply raise image resolution inside my image parser.
    • Mark Baker
      Mark Baker over 9 years
      The instance you display an image on your website, it's downloaded to the a user's PC when they display that page.... if you're displaying the high-res image, then they now have that image on their PC.... and it doesn't matter how much you obfusticate the link
    • Artur Filipiak
      Artur Filipiak over 9 years
      Mark Baker, I'm fully aware of that. Therefore the point is that the instance he get is just a preview item: w=700&h=500 , not the full size image. If I decide some day that the preview images are too small (for any reason), I could simply raise the size to f.ex: w=1200&h=800 which still is not even a half of the full resolution. Apart from the above, having this, I'm also not forced to keep few different sized copies of a single image for any other purposes.
    • astroanu
      astroanu over 9 years
      i've done something like this. what i did was keep a table on my db with unique ids of the shortened url segment and the long version.
    • Aron
      Aron over 7 years
      I know you said no database, but can you cheat and write the hash and url it relates to to a file? or the users session? The problem with creating shorter hashes is that unless you can maintain state you can't verify that hashes are unique to an image so you could end up with hash collisions.
    • Xenos
      Xenos over 7 years
      Is it just me, or is your "shorten URL" longer than the original parameters? Too short strings are where gzip & such algorithms fail at compressing (words declaration gets longer than their uses).
    • Phil
      Phil over 7 years
      You can get shorter urls if you limit your input alphabet. If your paths are only consisting of the 26 lower case a-z letters with . - and /, and your resolution integers don't go more than 16k, would that be an acceptable compromise? From my sums you can half the size of the url you are getting now, and it will actually end up shorter than the query string.
    • nodws
      nodws over 2 years
      how about just str_replace and use pipes to separate params? mysite.com/share/?/dir/dir/hi-res-img.jpg|700|500
  • Artur Filipiak
    Artur Filipiak over 9 years
    Thanks for the answer. It's very helpful, but do not suit my question, as I need non-DB solution (pure PHP).
  • Artur Filipiak
    Artur Filipiak over 7 years
    this would be better done by not obscuring at all Yes, you're right. I had it done with SQL before (the whole app was based on DB). However now I need everything to be plug&play. It's painfull to support users that can't handle simple database configuration. Over 30% tickets I got was regarding SQL problems. I lost customers because they expect the app to work "right out of the box", even if they had no idea what their DB password is... No more rely on user's programming knowlegde. But I have to give them something that their images are safe. Somehow. I will look at your solution, thanks!
  • Artur Filipiak
    Artur Filipiak over 7 years
    The path isn't public on the site. I have already done it so that the urls are nice and SEO friendly: www.mysite.com/gallery/animals/lion.jpg. While the real path is: /public/content/gallery/animals/images/lion.jpg. Its loaded dynamically in the back-end by: index.php?img=/public/content/gallery/animals/images/lion.jp‌​g&w=700&h=500 - this link you can see only by opening dev tools or "share" image. The shorten url is necessary in case of "share", I mean e.g: "share this image on facebook" and so on. So I don't really like it to be query string formatted. Thanks for your answer
  • Artur Filipiak
    Artur Filipiak over 7 years
    Thanks. Renaming files is not an option, unfortunately
  • Artur Filipiak
    Artur Filipiak over 7 years
    Yes, we're closer to the solution. I use actualy files to store images data as a JSON object, like: name, title, description... I'll look at your solution, thanks
  • Artur Filipiak
    Artur Filipiak over 7 years
    It's not a bad idea actually. I could make a "trigger" in the admin panel, so that user can simply recache all images at any time
  • Michael Coxon
    Michael Coxon over 7 years
    @ArturFilipiak that's pretty much the gist. It will also save CPU time as the image only has to be cached once. This is exactly how Wordpress and other CMS's do it. You could also add some extra headers to allow the images to be client side cached - especially if you take the rewrite route - as the path will look like a real static image.
  • Adam
    Adam over 7 years
    I edited my answer to add another way to do : using .htaccess to get a shorter url and then using recursive search to get a "shorterer" url.
  • Phil
    Phil over 7 years
    Oh dear. This answer is an extremely inefficient implementation of a hash table using a json file as storage (AKA a database). This would probably end up being slower than using a real database when you get a few thousand records in there. Think about all that parsing on every request. Think about I/O wait and concurrency. Not a good solution.
  • Artur Filipiak
    Artur Filipiak over 7 years
    @Phil_1984_, JSON file (on first load) along with a localStorage.
  • Artur Filipiak
    Artur Filipiak over 7 years
    How about the whole path? since you consider only the file name to be encrypted
  • Aron
    Aron over 7 years
    @Phil_1984_, I would disagree with it being inefficient compared to a "real" database due to I/O since we only read the file once, then cache the hash table in memory. I would expect my solution to be used as a singleton and multiple look-ups to be done at once. However you are correct that parsing JSON is expensive in PHP, so a CSV would be a better option.
  • Phil
    Phil over 7 years
    Perhaps I misunderstand the usage scenario, but speaking purely about PHP now... Even if you use it as a singleton, 1 singleton would get constructed for every single request which uses the library (e.g. image request). If 10 different users request different images at the same time, each url needs to get decoded and since there is no shared memory (unless you use something like memcache) each will have to read and parse the file.
  • Artur Filipiak
    Artur Filipiak over 7 years
    Thank you for putting some light on the question. It makes me understand the whole thing much much better
  • Peter Mortensen
    Peter Mortensen about 2 years
    What is the gist of it? What and how is it doing for the hashing? What kind hashing is it using? Can you add it to your answer? Please respond by editing (changing) your answer, not here in comments (without "Edit:", "Update:", or similar - the answer should appear as if it was written today).
  • Peter Mortensen
    Peter Mortensen about 2 years
    Isn't a space missing after Content-Type:? An example.