Issue parsing multiline JSON file using Python

17,396

Solution 1

You will go crazy if you try to parse a json file line by line. The json module has helper methods to read file objects directly or strings i.e. the load and loads methods. load takes a file object (as shown below) for a file that contains json data, while loads takes a string that contains json data.

Option 1: - Preferred

import json
with open('test.json', 'r') as jf:
    weatherData = json.load(jf)
    print weatherData

Option 2:

import json
with open('test.json', 'r') as jf:
    weatherData = json.loads(jf.read())
    print weatherData

If you are looking for higher performance json parsing check out ujson

Solution 2

In the first snippet, you try to parse it line by line. You should parse it all at once. The easiest is to use json.load(jsonfile). (The jf variable name is misleading as it's a string). So the correct way to parse it:

import json

with open('test.json', 'r') as jsonFile:
    weatherData = json.loads(jsonFile)

Although it's a good idea to store the json in one line, as it's more concise.

In the second snippet your problem is that you print it as unicode string which is and u'string here' is python specific. A valid json uses double quotation marks

Solution 3

FYI, you can have both files opened in single with statement:

with open('file_A') as in_, open('file_B', 'w+') as out_:
    # logic here
    ...
Share:
17,396
hypersonics
Author by

hypersonics

Updated on June 08, 2022

Comments

  • hypersonics
    hypersonics almost 2 years

    I am trying to parse a JSON multiline file using json library in Python 2.7. A simplified sample file is given below:

    {
    "observations": {
        "notice": [
            {
                "copyright": "Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml",
                "copyright_url": "http://www.bom.gov.au/other/copyright.shtml",
                "disclaimer_url": "http://www.bom.gov.au/other/disclaimer.shtml",
                "feedback_url": "http://www.bom.gov.au/other/feedback"
            }
        ]
    }
    }
    

    My code is as follows:

    import json
    
    with open('test.json', 'r') as jsonFile:
        for jf in jsonFile:
            jf = jf.replace('\n', '')
            jf = jf.strip()
            weatherData = json.loads(jf)
            print weatherData
    

    Nevertheless, I get an error as shown below:

    Traceback (most recent call last):
    File "test.py", line 8, in <module>
    weatherData = json.loads(jf)
    File "/home/usr/anaconda2/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
    File "/home/usr/anaconda2/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    File "/home/usr/anaconda2/lib/python2.7/json/decoder.py", line 380, in raw_decode
    obj, end = self.scan_once(s, idx)
    ValueError: Expecting object: line 1 column 1 (char 0)
    

    Just to do some testing, I modified the code such that after removing newlines and striping away the leading and trailing white spaces, I write the contents to another file (with the json extension). Surprisingly, when I read back the latter file, I do not get any error and the parsing is successful. The modified code is as follows:

    import json
    
    filewrite = open('out.json', 'w+')
    
    with open('test.json', 'r') as jsonFile:
        for jf in jsonFile:
            jf = jf.replace('\n', '')
            jf = jf.strip()
            filewrite.write(jf)
    
    filewrite.close()
    
    with open('out.json', 'r') as newJsonFile:
        for line in newJsonFile:
            weatherData = json.loads(line)
            print weatherData
    

    The output is as follows:

    {u'observations': {u'notice': [{u'copyright_url': u'http://www.bom.gov.au/other/copyright.shtml', u'disclaimer_url': u'http://www.bom.gov.au/other/disclaimer.shtml', u'copyright': u'Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml', u'feedback_url': u'http://www.bom.gov.au/other/feedback'}]}}
    

    Any idea what might be going on when new lines and white spaces are stripped before using json library?

  • hypersonics
    hypersonics over 8 years
    I tried your approach of loading the entire file and parsing all at once, but It fails. By the way, I have tried parsing json files earlier line by line without having any issues, expect that each dictionary never exceeded about a 100 characters long.
  • fodma1
    fodma1 over 8 years
    For my it was working fine with the json you provided. Try using the Sublime JSON plugin for converting the json back and forth: github.com/dzhibas/SublimePrettyJson It also validates the json, so you can investigate the issue further. For more detailed linting try jsonlint.com
  • Hamid Pourjam
    Hamid Pourjam over 8 years
    While this code may answer the question, it is better to explain what it does and add some references to it.
  • President James K. Polk
    President James K. Polk over 8 years
    This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review
  • Vlad Nikiporoff
    Vlad Nikiporoff over 8 years
    @James: agreed. at the moment of answering I was not able to comment on questions and thanks to upvote now I can :-)
  • hypersonics
    hypersonics over 8 years
    Thank you @OkezieE. Loading the entire file via load does the trick.