Python urllib2 Response header

57,491

Solution 1

Try to request as Firefox does. You can see the request headers in Firebug, so add them to your request object:

import urllib2

request = urllib2.Request('http://your.tld/...')
request.add_header('User-Agent', 'some fake agent string')
request.add_header('Referer', 'fake referrer')
...
response = urllib2.urlopen(request)
# check content type:
print response.info().getheader('Content-Type')

There's also HTTPCookieProcessor which can make it better, but I don't think you'll need it in most cases. Have a look at python's documentation:

http://docs.python.org/library/urllib2.html

Solution 2

Content-Type text/html

Really, like that, without the colon?

If so, that might explain it: it's an invalid header, so it gets ignored, so urllib guesses the content-type instead, by looking at the filename. If the URL happens to have ‘.flv’ at the end, it'll guess the type should be video/x-flv.

Solution 3

This peculiar discrepancy might be explained by different headers (maybe ones of the accept kind) being sent by the two requests -- can you check that...? Or, if Javascript is running in Firefox (which I assume you're using when you're running firebug?) -- since it's definitely NOT running in the Python case -- "all bets are off", as they say;-).

Solution 4

Keep in mind that a web server can return different results for the same URL based on differences in the request. For example, content-type negotiation: the requestor can specify a list of content-types it will accept, and the server can return different results to try to accomodate different needs.

Also, you may be getting an error page for one of your requests, for example, because it is malformed, or you don't have cookies set that authenticate you properly, etc. Look at the response itself to see what you are getting.

Share:
57,491
Admin
Author by

Admin

Updated on July 30, 2022

Comments

  • Admin
    Admin almost 2 years

    I'm trying to extract the response header of a URL request. When I use firebug to analyze the response output of a URL request, it returns:

    Content-Type text/html
    

    However when I use the python code:

    urllib2.urlopen(URL).info()
    

    the resulting output returns:

    Content-Type: video/x-flv
    

    I am new to python, and to web programming in general; any helpful insight is much appreciated. Also, if more info is needed please let me know.

    Thanks in advance for reading this post