Getting HEAD content with Python Requests
Solution 1
By definition, the responses to HEAD requests do not contain a message-body.
Send a GET request if you want to, well, get a response body. Send a HEAD request iff you are only interested in the response status code and headers.
HTTP transfers arbitrary content; the HTTP term header is completely unrelated to an HTML <head>
. However, HTTP can be advised to download only a part of the document. If you know the length of the HTML <head>
code (or an upper boundary therefor), you can include an HTTP Range header in your request that advises the remote server to only return a certain number of bytes. If the remote server supports HTTP ranges, it will then serve the reduced answer.
Solution 2
A HEAD doesn't have any content! Try response.headers
- that's probably where the action is. An HTTP HEAD request doesn't get the <head>
element of the HTML response you would get from a GET request. I think that's your mistake.
Solution 3
HEAD responses have no body. They only return HTTP headers, the same you would get using a GET request.
Related videos on Youtube
Yarin
Products PDF Buddy - Popular online PDF editor Gems Snappconfig - Smarter Rails app configuration
Updated on April 08, 2020Comments
-
Yarin about 4 years
I'm trying to parse the result of a HEAD request done using the Python Requests library, but can't seem to access the response content.
According to the docs, I should be able to access the content from requests.Response.text. This works fine for me on GET requests, but returns None on HEAD requests.
GET request (works)
import requests response = requests.get(url) content = response.text
content =
<html>...</html>
HEAD request (no content)
import requests response = requests.head(url) content = response.text
content =
None
EDIT
OK I've quickly realized form the answers that the HEAD request is not supposed to return content- only headers. But does that mean that, to access things found IN the
<head>
tag of a page, like<link>
and<meta>
tags, that one must GET the whole document? -
Yarin about 12 yearsOK my mistake- but then how does one capture things like
<link>
andmeta
tags from a HEAD request- or is that not possible? -
phihag about 12 yearsUmm,
<link>
and<meta>
tags are only present in the HTML body. The only headers you can access are the HTTP ones. Why do you want to send a HEAD instead of a GET anyways? -
Yarin about 12 yearsphihag- ?
<meta>
tags are within the<head>
section of a doc- view source on this page. I was hoping to get only the<head>
to reduce time on link scraping. -
phihag about 12 yearsYou're confusing similar terms in the context of different protocols. HTTP does not know anything about HTML code; it just transfers arbitrary content with headers (for example for the content type or its expiration date). If you know the length of the HTML
<head>
, you can include the Range header in your request, but I'll doubt that will speed up things unless the full HTML code is really huge.