Get a header with Python and convert in JSON (requests - urllib2 - json)

17,004

Solution 1

There are more than a couple ways to encode headers as JSON, but my first thought would be to convert the headers attribute to an actual dictionary instead of accessing it as requests.structures.CaseInsensitiveDict

import requests, json
r = requests.get("https://www.python.org/")
rh = json.dumps(r.headers.__dict__['_store'])
print rh

{'content-length': ('content-length', '45474'), 'via': ('via', '1.1 varnish'), 'x-cache': ('x-cache', 'HIT'), 'accept-ranges': ('accept-ranges', 'bytes'), 'strict-transport-security': ('strict-transport-security', 'max-age=63072000; includeSubDomains'), 'vary': ('vary', 'Cookie'), 'server': ('server', 'nginx'), 'x-served-by': ('x-served-by', 'cache-iad2132-IAD'), 'x-cache-hits': ('x-cache-hits', '1'), 'date': ('date', 'Wed, 02 Jul 2014 14:13:37 GMT'), 'x-frame-options': ('x-frame-options', 'SAMEORIGIN'), 'content-type': ('content-type', 'text/html; charset=utf-8'), 'age': ('age', '1483')}

Depending on exactly what you want on the headers you can specifically access them after this, but this will give you all the information contained in the headers, if in a slightly different format.

If you prefer a different format, you can also convert your headers to a dictionary:

import requests, json
r = requests.get("https://www.python.org/")
print json.dumps(dict(r.headers))

{"content-length": "45682", "via": "1.1 varnish", "x-cache": "HIT", "accept-ranges": "bytes", "strict-transport-security": "max-age=63072000; includeSubDomains", "vary": "Cookie", "server": "nginx", "x-served-by": "cache-at50-ATL", "x-cache-hits": "5", "date": "Wed, 02 Jul 2014 14:08:15 GMT", "x-frame-options": "SAMEORIGIN", "content-type": "text/html; charset=utf-8", "age": "951"}

Solution 2

If you are only interested in the header, make a head request. convert the CaseInsensitiveDict in a dict object and then convert it to json.

import requests
import json
r = requests.head('https://www.python.org/')
rh = dict(r.headers)
json.dumps(rh)

Solution 3

import requests
import json

r = requests.get('https://www.python.org/')
rh = r.headers

print json.dumps( dict(rh) ) # use dict()

result:

{"content-length": "45682", "via": "1.1 varnish", "x-cache": "HIT", "accept-ranges": "bytes", "strict-transport-security": "max-age=63072000; includeSubDomains", "vary": "Cookie", "server": "nginx", "x-served-by": "cache-fra1224-FRA", "x-cache-hits": "5", "date": "Wed, 02 Jul 2014 14:08:04 GMT", "x-frame-options": "SAMEORIGIN", "content-type": "text/html; charset=utf-8", "age": "3329"}

Share:
17,004
The One Electronic
Author by

The One Electronic

Updated on June 03, 2022

Comments

  • The One Electronic
    The One Electronic almost 2 years

    I’m trying to get the header from a website, encode it in JSON to write it to a file. I’ve tried two different ways without success.

    FIRST with urllib2 and json

    import urllib2
    import json
    host = ("https://www.python.org/")
    header = urllib2.urlopen(host).info()
    json_header = json.dumps(header)
    print json_header
    

    in this way I get the error:

    TypeError: is not JSON serializable

    So I try to bypass this issue by converting the object to a string -> json_header = str(header) In this way I can json_header = json.dumps(header) but the output it’s weird:

    "Date: Wed, 02 Jul 2014 13:33:37 GMT\r\nServer: nginx\r\nContent-Type: text/html; charset=utf-8\r\nX-Frame-Options: SAMEORIGIN\r\nContent-Length: 45682\r\nAccept-Ranges: bytes\r\nVia: 1.1 varnish\r\nAge: 1263\r\nX-Served-By: cache-fra1220-FRA\r\nX-Cache: HIT\r\nX-Cache-Hits: 2\r\nVary: Cookie\r\nStrict-Transport-Security: max-age=63072000; includeSubDomains\r\nConnection: close\r\n"

    SECOND with requests

    import requests
    r = requests.get(“https://www.python.org/”)
    rh = r.headers
    print rh
    

    {'content-length': '45682', 'via': '1.1 varnish', 'x-cache': 'HIT', 'accept-ranges': 'bytes', 'strict-transport-security': 'max-age=63072000; includeSubDomains', 'vary': 'Cookie', 'server': 'nginx', 'x-served-by': 'cache-fra1226-FRA', 'x-cache-hits': '14', 'date': 'Wed, 02 Jul 2014 13:39:33 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'text/html; charset=utf-8', 'age': '1619'}

    In this way the output is more JSON like but still not OK (see the ‘ ‘ instead of “ “ and other stuff like the = and ;). Evidently there’s something (or a lot) I’m not doing in the right way. I’ve tried to read the documentation of the modules but I can’t understand how to solve this problem. Thank you for your help.

  • The One Electronic
    The One Electronic almost 10 years
    Thank you very much @Slater Tyranus. Your second method it's exactly what I was looking for. Just one question out of curiosity. Reading the output of your first method I see that keys and values are inside ' '. Why json.dumps does that in that case? Should a valid JSON format have values inside " "?
  • The One Electronic
    The One Electronic almost 10 years
    Thank you for your help @furas