How to read a large JSON file using Python ijson?

14,918

ijson has an iterator interface to deal with large JSON files allowing to read the file lazily. You can process the file in small chunks and save results somewhere else.

Calling ijson.parse() yields three values prefix, event, value

Some JSON:

{
    "europe": [
      {"name": "Paris", "type": "city"},
      {"name": "Rhein", "type": "river"}
    ]
  }

Code:

import ijson


data = ijson.parse(open(FILE_PATH, 'r'))

for prefix, event, value in data:
    if event == 'string':
        print(value)

Output:

Paris
city
Rhein
river

Reference: https://pypi.python.org/pypi/ijson

Share:
14,918
Paul
Author by

Paul

Updated on July 01, 2022

Comments

  • Paul
    Paul almost 2 years

    I am trying to parse a big json file (hundreds of gigs) to extract information from its keys. For simplicity, consider the following example:

    import random, string
    
    # To create a random key 
    def random_string(length):
            return "".join(random.choice(string.lowercase) for i in range(length))
    
    # Create the dicitonary 
    dummy = {random_string(10): random.sample(range(1, 1000), 10) for times in range(15)}
    
    # Dump the dictionary into a json file 
    with open("dummy.json", "w") as fp:
            json.dump(dummy, fp)
    

    Then, I use ijson in python 2.7 to parse the file:

    file_name = "dummy.json"
    
    with open(file_name, "r") as fp:
    
        for key in dummy.keys():
    
            print "key: ", key 
    
            parser = ijson.items(fp, str(key) + ".item")
    
            for number in parser:
                print number,
    

    I was expecting to retrieve all the numbers in the lists corresponding to the keys of the dic. However, I got

    IncompleteJSONError: Incomplete JSON data

    I am aware of this post: Using python ijson to read a large json file with multiple json objects, but in my case I have a single json file, that is well formed, with a relative simple schema. Any ideas on how can I parse it? Thank you.

  • Paul
    Paul about 6 years
    The above example produces a dictionary for which the parser produces the error I described. This is not the same.
  • Abdulrahman Bres
    Abdulrahman Bres about 6 years
    You can not use ijson.items for a large file, it won't read the entire file and error will be thrown
  • Ham
    Ham almost 3 years
    For large file, you need to carefully work with the generator returned by ijson.items() or ijson.parse() , e.g. you should avoid fetching value by set(your_generator) or list(your_generator)