Header only retrieval in php via curl
Solution 1
You are passing $header to curl_getinfo()
. It should be $curl
(the curl handle). You can get just the filetime
by passing CURLINFO_FILETIME
as the second parameter to curl_getinfo()
. (Often the filetime
is unavailable, in which case it will be reported as -1).
Your class seems to be wasteful, though, throwing away a lot of information that could be useful. Here's another way it might be done:
class URIInfo
{
public $info;
public $header;
private $url;
public function __construct($url)
{
$this->url = $url;
$this->setData();
}
public function setData()
{
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $this->url);
curl_setopt($curl, CURLOPT_FILETIME, true);
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, true);
$this->header = curl_exec($curl);
$this->info = curl_getinfo($curl);
curl_close($curl);
}
public function getFiletime()
{
return $this->info['filetime'];
}
// Other functions can be added to retrieve other information.
}
$uri_info = new URIInfo('http://www.codinghorror.com/blog/');
$filetime = $uri_info->getFiletime();
if ($filetime != -1) {
echo date('Y-m-d H:i:s', $filetime);
} else {
echo 'filetime not available';
}
Yes, the load will be lighter on the server, since it's only returning only the HTTP header (responding, after all, to a HEAD
request). How much lighter will vary greatly.
Solution 2
Why use CURL for this? There is a PHP-function for that:
$headers=get_headers("http://www.amazingjokes.com/img/2014/530c9613d29bd_CountvonCount.jpg");
print_r($headers);
returns the following:
Array
(
[0] => HTTP/1.1 200 OK
[1] => Date: Tue, 11 Mar 2014 22:44:38 GMT
[2] => Server: Apache
[3] => Last-Modified: Tue, 25 Feb 2014 14:08:40 GMT
[4] => ETag: "54e35e8-8873-4f33ba00673f4"
[5] => Accept-Ranges: bytes
[6] => Content-Length: 34931
[7] => Connection: close
[8] => Content-Type: image/jpeg
)
Should be easy to get the content-type after this.
You could also add the format=1 to get_headers:
$headers=get_headers("http://www.amazingjokes.com/img/2014/530c9613d29bd_CountvonCount.jpg",1);
print_r($headers);
This will return the following:
Array
(
[0] => HTTP/1.1 200 OK
[Date] => Tue, 11 Mar 2014 22:44:38 GMT
[Server] => Apache
[Last-Modified] => Tue, 25 Feb 2014 14:08:40 GMT
[ETag] => "54e35e8-8873-4f33ba00673f4"
[Accept-Ranges] => bytes
[Content-Length] => 34931
[Connection] => close
[Content-Type] => image/jpeg
)
Solution 3
(1) Yes. A HEAD request (as you're issuing in this case) is far lighter on the server because it only returns the HTTP headers, as opposed to the headers and content like a standard GET request.
(2) You need to set the CURLOPT_RETURNTRANSFER option to true
before you call curl_exec()
to have the content returned, as opposed to printed:
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
That should also make your class work correctly.
Solution 4
You can set the default stream context:
stream_context_set_default(
array(
'http' => array(
'method' => 'HEAD'
)
)
);
Then use:
$headers = get_headers($url,1);
get_headers seems to be more efficient than cURL once get_headers skip steps like trigger authentication routines such as log in prompts or cookies.
Solution 5
Here is my implementation using CURLOPT_HEADER, then parsing the output string into a map:
function http_headers($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
$headers = curl_exec($ch);
curl_close($ch);
$data = [];
$headers = explode(PHP_EOL, $headers);
foreach ($headers as $row) {
$parts = explode(':', $row);
if (count($parts) === 2) {
$data[trim($parts[0])] = trim($parts[1]);
}
}
return $data;
};
Sample usage:
$headers = http_headers('https://i.ytimg.com/vi_webp/g-dKXOlsf98/hqdefault.webp');
print_r($headers);
Array
(
['Content-Type'] => 'image/webp'
['ETag'] => '1453807629'
['X-Content-Type-Options'] => 'nosniff'
['Server'] => 'sffe'
['Content-Length'] => 32958
['X-XSS-Protection'] => '1; mode=block'
['Age'] => 11
['Cache-Control'] => 'public, max-age=7200'
)
Thiago Falcão
Updated on July 08, 2022Comments
-
Thiago Falcão almost 2 years
Actually I have two questions.
(1) Is there any reduction in processing power or bandwidth used on remote server if I retrieve only headers as opposed to full page retrieval using php and curl?
(2) Since I think, and I might be wrong, that answer to first questions is YES, I am trying to get last modified date or If-Modified-Since header of remote file only in order to compare it with time-date of locally stored data, so I can, in case it has been changed, store it locally. However, my script seems unable to fetch that piece of info, I get
NULL
, when I run this:class last_change { public last_change; function set_last_change() { $curl = curl_init(); curl_setopt($curl, CURLOPT_URL, "http://url/file.xml"); curl_setopt($curl, CURLOPT_HEADER, true); curl_setopt($curl, CURLOPT_FILETIME, true); curl_setopt($curl, CURLOPT_NOBODY, true); // $header = curl_exec($curl); $this -> last_change = curl_getinfo($header); curl_close($curl); } function get_last_change() { return $this -> last_change['datetime']; // I have tested with Last-Modified & If-Modified-Since to no avail } }
In case
$header = curl_exec($curl)
is uncomented, header data is displayed, even if I haven't requested it and is as follows:HTTP/1.1 200 OK Date: Fri, 04 Sep 2009 12:15:51 GMT Server: Apache/2.2.8 (Linux/SUSE) Last-Modified: Thu, 03 Sep 2009 12:46:54 GMT ETag: "198054-118c-472abc735ab80" Accept-Ranges: bytes Content-Length: 4492 Content-Type: text/xml
Based on that, 'Last-Modified' is returned.
So, what am I doing wrong?
-
Tim almost 10 yearsJust note that according to php docs this will do a GET request instead of a HEAD request, which seems inefficient. php.net/manual/en/function.get-headers.php#example-4203
-
patrick almost 10 years@Tim, indeed, didn't know that. Shall I edit this post to reflect the more efficient way suggested on PHP.NET? I know I will adapt my programming to this!
-
Muaaz Khalid over 9 yearsCURL is necessary if someone wants to get header while using cookies.
-
Lukas about 9 yearsIt is to be noted that the above code will not return any headers, just the info vars. To retrieve headers too you need to add
curl_setopt($curl, CURLOPT_HEADER, true);
. The headers come in plaint text form though and need to be parsed afterwards. -
Martin Burch over 5 yearsI tried this, but it misses headers with
:
in the values, likeLast-Modified: Mon, 01 Oct 2018 21:57:45 GMT
-
Dylan Maxey almost 5 yearsThis method only works if "allow_url_fopen" is set to true (1) in php.ini
-
bur almost 5 yearsCan also be done without setting the default:
get_headers($url, 1, stream_context_create(array('http' => array('method' => 'HEAD')))
. See php.net/manual/en/function.stream-context-create.php -
myriacl over 3 yearsheaders are separated with \r\n Using PHP_EOL is the wrong way to do it there :
$headers = explode(PHP_EOL, $headers);
you should do$headers = explode("\r\n", $headers);