Parsing JSON with python: blank fields

12,813

Solution 1

Use dict.get instead of []:

entries['extensions'].get('telephone', '')

Or, simply:

entries['extensions'].get('telephone')

get will return the second argument (default, None) instead of raising a KeyError when the key is not found.

Solution 2

If the data is missing in only one place, then dict.get can be used to fill-in missing the missing value:

tel = d['entries'][0]['extensions'].get('telelphone', '')

If the problem is more widespread, you can have the JSON parser use a defaultdict or custom dictionary instead of a regular dictionary. For example, given the JSON string:

json_txt = '''{
    "entries": [
        {
            "extensions": {
                "telephone": "123123", 
                "url": "www.blablablah", 
                "name": "name", 
                "coordinates": "coords", 
                "address": "address"
            }, 
            "summary": "here is the summary"
        }
    ]
}'''

Parse it with:

>>> class BlankDict(dict):
        def __missing__(self, key):
            return ''

>>> d = json.loads(json_txt, object_hook=BlankDict)

>>> d['entries'][0]['summary']
u'here is the summary'

>>> d['entries'][0]['extensions']['color']
''

As a side note, if you want to clean-up your datasets and enforce consistency, there is a fine tool called Kwalify that does schema validation on JSON (and on YAML);

Share:
12,813
Pablo Pardo
Author by

Pablo Pardo

Updated on June 19, 2022

Comments

  • Pablo Pardo
    Pablo Pardo almost 2 years

    I'm having problems while parsing a JSON with python, and now I'm stuck.
    The problem is that the entities of my JSON are not always the same. The JSON is something like:

    "entries":[
    {
    "summary": "here is the sunnary",
    "extensions": {
       "coordinates":"coords",
       "address":"address",
       "name":"name"
       "telephone":"123123"
       "url":"www.blablablah"
    },
    }
    ]
    

    I can move through the JSON, for example:

    for entrie in entries:
      name =entrie['extensions']['name']
      tel=entrie['extensions']['telephone']
    

    The problem comes because sometimes, the JSON does not have all the "fields", for example, the telephone field, sometimes is missing, so, the script fails with KeyError, because the key telephone is missing in this entry.
    So, my question: how could I run this script, leaving a blank space where telephone is missing? I've tried with:

    if entrie['extensions']['telephone']:
        tel=entrie['extensions']['telephone']
    

    but I think is not ok.

  • Derek Litz
    Derek Litz almost 11 years
    Nice, I like this better then defaultdict because inside the __missing__ method one would be able to add some logic to catch a potential bug. With defaultdict I always cringe because I won't get a KeyError when I make a typo.
  • Marcin
    Marcin over 5 years
    entries['extensions'].get('telephone', {}).get('anothermissingkey',{}) is almost 3x as fast (on Deb9's py3.5) as object_hook=BlankDict approach, and it works for multiple levels