cURL get url from redirect

16,690

Solution 1

In your particular case, the server is checking for certain user-agent strings.

When a server checks the user-agent string, it will only respond with a 302 redirect status code when the server sees a "valid" (according to the server) user-agent. Any "invalid" user-agents will not receive the 302 redirect status code response or Location: header.

In your particular case, when the server receives a request from an "invalid" user-agent it responds with a 200 OK status code with no text in the response body.

(Note: in the code below, the actual URLs provided have been replaced with examples.)

Let's say that http://www.example.com's server checks the User-Agent string and that http://www.example.com/product/123/ redirects to http://www.example.org/abc.

In PHP your solution would be:

<?php

$url = 'http://www.example.com/product/123/';

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0"); // Necessary. The server checks for a valid User-Agent.
curl_exec($ch);

$response = curl_exec($ch);
preg_match_all('/^Location:(.*)$/mi', $response, $matches);
curl_close($ch);

echo !empty($matches[1]) ? trim($matches[1][0]) : 'No redirect found';

And, the output of this script would be: http://www.example.org/abc.

Solution 2

Try using this code:

function curl_last_url(/*resource*/ $ch, /*int*/ &$maxredirect = null) { 
$mr = $maxredirect === null ? 5 : intval($maxredirect); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); 
    if ($mr > 0) { 
        echo $mr;
        echo $newurl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); 

        $rch = curl_copy_handle($ch); 
        curl_setopt($rch, CURLOPT_HEADER, true); 
        curl_setopt($rch, CURLOPT_NOBODY, true); 
        curl_setopt($rch, CURLOPT_FORBID_REUSE, false); 
        curl_setopt($rch, CURLOPT_RETURNTRANSFER, true); 
        do { 
            curl_setopt($rch, CURLOPT_URL, $newurl); 
            $header = curl_exec($rch); 
            if (curl_errno($rch)) { 
                $code = 0; 
            } else { 
                $code = curl_getinfo($rch, CURLINFO_HTTP_CODE); 
                echo $code;
                if ($code == 301 || $code == 302) { 
                    preg_match('/Location:(.*?)\n/', $header, $matches); 
                    $newurl = trim(array_pop($matches)); 
                } else { 
                    $code = 0; 
                } 
            } 
        } while ($code && --$mr); 
        curl_close($rch); 
        if (!$mr) { 
            if ($maxredirect === null) { 
                trigger_error('Too many redirects. When following redirects, libcurl hit the maximum amount.', E_USER_WARNING); 
            } else { 
                $maxredirect = 0; 
            } 
            return false; 
        } 
        curl_setopt($ch, CURLOPT_URL, $newurl); 
    } 
return $newurl; 

}

Share:
16,690
Admin
Author by

Admin

Updated on June 15, 2022

Comments

  • Admin
    Admin almost 2 years

    I'm currently using cURL to try and get the URL from a redirect for a website scraper. I only need the url from the website. I've researched on stackoverflow and other sites for the past couple days and have been unsuccessful. The code I'm currently using is from this website:

      $url = "http://www.someredirect.com";
      $ch = curl_init($url);
      curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1');         
      curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
      curl_setopt($ch, CURLOPT_HEADER, true);
      curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
      curl_setopt($ch, CURLOPT_NOBODY, true);
      $response = curl_exec($ch);
      preg_match_all('/^Location:(.*)$/mi', $response, $matches);
      curl_close($ch);
      echo !empty($matches[1]) ? trim($matches[1][0]) : 'No redirect found';
    

    Any help would be greatly appreciated!