How to get final URL after following HTTP redirections in pure PHP?

42,046

Solution 1

/**
 * get_redirect_url()
 * Gets the address that the provided URL redirects to,
 * or FALSE if there's no redirect. 
 *
 * @param string $url
 * @return string
 */
function get_redirect_url($url){
    $redirect_url = null; 

    $url_parts = @parse_url($url);
    if (!$url_parts) return false;
    if (!isset($url_parts['host'])) return false; //can't process relative URLs
    if (!isset($url_parts['path'])) $url_parts['path'] = '/';

    $sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
    if (!$sock) return false;

    $request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n"; 
    $request .= 'Host: ' . $url_parts['host'] . "\r\n"; 
    $request .= "Connection: Close\r\n\r\n"; 
    fwrite($sock, $request);
    $response = '';
    while(!feof($sock)) $response .= fread($sock, 8192);
    fclose($sock);

    if (preg_match('/^Location: (.+?)$/m', $response, $matches)){
        if ( substr($matches[1], 0, 1) == "/" )
            return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
        else
            return trim($matches[1]);

    } else {
        return false;
    }

}

/**
 * get_all_redirects()
 * Follows and collects all redirects, in order, for the given URL. 
 *
 * @param string $url
 * @return array
 */
function get_all_redirects($url){
    $redirects = array();
    while ($newurl = get_redirect_url($url)){
        if (in_array($newurl, $redirects)){
            break;
        }
        $redirects[] = $newurl;
        $url = $newurl;
    }
    return $redirects;
}

/**
 * get_final_url()
 * Gets the address that the URL ultimately leads to. 
 * Returns $url itself if it isn't a redirect.
 *
 * @param string $url
 * @return string
 */
function get_final_url($url){
    $redirects = get_all_redirects($url);
    if (count($redirects)>0){
        return array_pop($redirects);
    } else {
        return $url;
    }
}

And, as always, give credit:

http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/

Solution 2

function getRedirectUrl ($url) {
    stream_context_set_default(array(
        'http' => array(
            'method' => 'HEAD'
        )
    ));
    $headers = get_headers($url, 1);
    if ($headers !== false && isset($headers['Location'])) {
        return $headers['Location'];
    }
    return false;
}

Additionally...

As was mentioned in a comment, the final item in $headers['Location'] will be your final URL after all redirects. It's important to note, though, that it won't always be an array. Sometimes it's just a run-of-the-mill, non-array variable. In this case, trying to access the last array element will most likely return a single character. Not ideal.

If you are only interested in the final URL, after all the redirects, I would suggest changing

return $headers['Location'];

to

return is_array($headers['Location']) ? array_pop($headers['Location']) : $headers['Location'];

... which is just if short-hand for

if(is_array($headers['Location'])){
     return array_pop($headers['Location']);
}else{
     return $headers['Location'];
}

This fix will take care of either case (array, non-array), and remove the need to weed-out the final URL after calling the function.

In the case where there are no redirects, the function will return false. Similarly, the function will also return false for invalid URLs (invalid for any reason). Therefor, it is important to check the URL for validity before running this function, or else incorporate the redirect check somewhere into your validation.

Solution 3

While the OP wanted to avoid cURL, it's best to use it when it's available. Here's a solution which has the following advantages

  • uses curl for all the heavy lifting, so works with https
  • copes with servers which return lower cased location header name (both xaav and webjay's answers do not handle this)
  • allows you to control how deep you want you go before giving up

Here's the function:

function findUltimateDestination($url, $maxRequests = 10)
{
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_MAXREDIRS, $maxRequests);
    curl_setopt($ch, CURLOPT_TIMEOUT, 15);

    //customize user agent if you desire...
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)');

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_exec($ch);

    $url=curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);

    curl_close ($ch);
    return $url;
}

Here's a more verbose version which allows you to inspect the redirection chain rather than let curl follow it.

function findUltimateDestination($url, $maxRequests = 10)
{
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 15);

    //customize user agent if you desire...
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Link Checker)');

    while ($maxRequests--) {

        //fetch
        curl_setopt($ch, CURLOPT_URL, $url);
        $response = curl_exec($ch);

        //try to determine redirection url
        $location = '';
        if (in_array(curl_getinfo($ch, CURLINFO_HTTP_CODE), [301, 302, 303, 307, 308])) {
            if (preg_match('/Location:(.*)/i', $response, $match)) {
                $location = trim($match[1]);
            }
        }

        if (empty($location)) {
            //we've reached the end of the chain...
            return $url;
        }

        //build next url
        if ($location[0] == '/') {
            $u = parse_url($url);
            $url = $u['scheme'] . '://' . $u['host'];
            if (isset($u['port'])) {
                $url .= ':' . $u['port'];
            }
            $url .= $location;
        } else {
            $url = $location;
        }
    }

    return null;
}

As an example of redirection chain which this function handles, but the others do not, try this:

echo findUltimateDestination('http://dx.doi.org/10.1016/j.infsof.2016.05.005')

At the time of writing, this involves 4 requests, with a mixture of Location and location headers involved.

Solution 4

xaav answer is very good; except for the following two issues:

  • It does not support HTTPS protocol => The solution was proposed as a comment in the original site: http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/

  • Some sites will not work since they will not recognise the underlying user agent (client browser) => This is simply fixed by adding a User-agent header field: I added an Android user agent (you can find here http://www.useragentstring.com/pages/useragentstring.php other user agent examples according you your need):

    $request .= "User-Agent: Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30\r\n";

Here's the modified answer:

/**
 * get_redirect_url()
 * Gets the address that the provided URL redirects to,
 * or FALSE if there's no redirect. 
 *
 * @param string $url
 * @return string
 */
function get_redirect_url($url){
    $redirect_url = null; 

    $url_parts = @parse_url($url);
    if (!$url_parts) return false;
    if (!isset($url_parts['host'])) return false; //can't process relative URLs
    if (!isset($url_parts['path'])) $url_parts['path'] = '/';

    $sock = fsockopen($url_parts['host'], (isset($url_parts['port']) ? (int)$url_parts['port'] : 80), $errno, $errstr, 30);
    if (!$sock) return false;

    $request = "HEAD " . $url_parts['path'] . (isset($url_parts['query']) ? '?'.$url_parts['query'] : '') . " HTTP/1.1\r\n"; 
    $request .= 'Host: ' . $url_parts['host'] . "\r\n"; 
    $request .= "User-Agent: Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30\r\n";
    $request .= "Connection: Close\r\n\r\n"; 
    fwrite($sock, $request);
    $response = '';
    while(!feof($sock)) $response .= fread($sock, 8192);
    fclose($sock);

    if (preg_match('/^Location: (.+?)$/m', $response, $matches)){
        if ( substr($matches[1], 0, 1) == "/" )
            return $url_parts['scheme'] . "://" . $url_parts['host'] . trim($matches[1]);
        else
            return trim($matches[1]);

    } else {
        return false;
    }

}

/**
 * get_all_redirects()
 * Follows and collects all redirects, in order, for the given URL. 
 *
 * @param string $url
 * @return array
 */
function get_all_redirects($url){
    $redirects = array();
    while ($newurl = get_redirect_url($url)){
        if (in_array($newurl, $redirects)){
            break;
        }
        $redirects[] = $newurl;
        $url = $newurl;
    }
    return $redirects;
}

/**
 * get_final_url()
 * Gets the address that the URL ultimately leads to. 
 * Returns $url itself if it isn't a redirect.
 *
 * @param string $url
 * @return string
 */
function get_final_url($url){
    $redirects = get_all_redirects($url);
    if (count($redirects)>0){
        return array_pop($redirects);
    } else {
        return $url;
    }
}
Share:
42,046
Weboide
Author by

Weboide

Linux Geek and Web Dev

Updated on July 09, 2022

Comments

  • Weboide
    Weboide almost 2 years

    What I'd like to do is find out what is the last/final URL after following the redirections.

    I would prefer not to use cURL. I would like to stick with pure PHP (stream wrappers).

    Right now I have a URL (let's say http://domain.test), and I use get_headers() to get specific headers from that page. get_headers will also return multiple Location: headers (see Edit below). Is there a way to use those headers to build the final URL? or is there a PHP function that would automatically do this?

    Edit: get_headers() follows redirections and returns all the headers for each response/redirections, so I have all the Location: headers.

  • Weboide
    Weboide over 12 years
    Does this follow all the redirections and returns the final URL?
  • Dave
    Dave over 12 years
    You sir, just saved me few hours of searching. Everything works as expected.
  • Sekhar Babu
    Sekhar Babu about 9 years
    Great yar.. Lot of time saved. +1
  • Vishnu Ks
    Vishnu Ks almost 9 years
    The marked answer to this question didn't follow all redirects. But this solution did.
  • cweiske
    cweiske over 8 years
    get_headers follows all redirects and thus returns an array in $headers[Location']`; the last entry being the final URL
  • kliron
    kliron about 7 years
    A note of caution on this answer. get_headers thinks location and Location are different headers. If you are following a chain of redirects which use different cases, you can't tell which header is the 'last' one, as you'll have urls in both $headers['location'] and $headers['Location']
  • tomvo
    tomvo about 7 years
    although it works fine beware of using this random in your code, because of the stream_context_set_default any following file operations like file_get_contents fail. A workaround would be to buffer the original context using stream_context_get_default and resetting it back to that when done.
  • Paolo Mioni
    Paolo Mioni almost 7 years
    I have to say that for my tests this solution with CURL is more reliable: stackoverflow.com/questions/17472329/…
  • Michael Rogers
    Michael Rogers almost 7 years
    Error 500 executing this script.
  • ctrlbrk
    ctrlbrk about 4 years
    I see fsockopen is set to port 80, does this not support a SSL/HTTPS 443 port redirect?
  • Admin
    Admin almost 3 years
    Its not returning the final (last) redirected url
  • Brds
    Brds almost 3 years
    needs a } after the else condition on the the 3rd to last line
  • Djongov
    Djongov almost 3 years
    Damn none of the answers show more than 1 redirect :( even the curl ones
  • Ahsan Khan
    Ahsan Khan over 2 years
    The 2nd function worked for me. thanks