Python json.loads shows ValueError: Extra data

441,448

Solution 1

As you can see in the following example, json.loads (and json.load) does not decode multiple json object.

>>> json.loads('{}')
{}
>>> json.loads('{}{}') # == json.loads(json.dumps({}) + json.dumps({}))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 368, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 3 - line 1 column 5 (char 2 - 4)

If you want to dump multiple dictionaries, wrap them in a list, dump the list (instead of dumping dictionaries multiple times)

>>> dict1 = {}
>>> dict2 = {}
>>> json.dumps([dict1, dict2])
'[{}, {}]'
>>> json.loads(json.dumps([dict1, dict2]))
[{}, {}]

Solution 2

You can just read from a file, jsonifying each line as you go:

tweets = []
for line in open('tweets.json', 'r'):
    tweets.append(json.loads(line))

This avoids storing intermediate python objects. As long as your write one full tweet per append() call, this should work.

Solution 3

I came across this because I was trying to load a JSON file dumped from MongoDB. It was giving me an error

JSONDecodeError: Extra data: line 2 column 1

The MongoDB JSON dump has one object per line, so what worked for me is:

import json
data = [json.loads(line) for line in open('data.json', 'r')]

Solution 4

This may also happen if your JSON file is not just 1 JSON record. A JSON record looks like this:

[{"some data": value, "next key": "another value"}]

It opens and closes with a bracket [ ], within the brackets are the braces { }. There can be many pairs of braces, but it all ends with a close bracket ]. If your json file contains more than one of those:

[{"some data": value, "next key": "another value"}]
[{"2nd record data": value, "2nd record key": "another value"}]

then loads() will fail.

I verified this with my own file that was failing.

import json

guestFile = open("1_guests.json",'r')
guestData = guestFile.read()
guestFile.close()
gdfJson = json.loads(guestData)

This works because 1_guests.json has one record []. The original file I was using all_guests.json had 6 records separated by newline. I deleted 5 records, (which I already checked to be bookended by brackets) and saved the file under a new name. Then the loads statement worked.

Error was

   raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 10 column 1 (char 261900 - 6964758)

PS. I use the word record, but that's not the official name. Also, if your file has newline characters like mine, you can loop through it to loads() one record at a time into a json variable.

Solution 5

I just got the same error while my json file is like this

{"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"}
{"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}

And I found it malformed, so I changed it to:

{
  "datas":[
    {"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"},
    {"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}
  ]
}
Share:
441,448

Related videos on Youtube

Apoorv Ashutosh
Author by

Apoorv Ashutosh

inactive account. was a computer science student. no longer a developer

Updated on November 09, 2021

Comments

  • Apoorv Ashutosh
    Apoorv Ashutosh over 2 years

    I am getting some data from a JSON file "new.json", and I want to filter some data and store it into a new JSON file. Here is my code:

    import json
    with open('new.json') as infile:
        data = json.load(infile)
    for item in data:
        iden = item.get["id"]
        a = item.get["a"]
        b = item.get["b"]
        c = item.get["c"]
        if c == 'XYZ' or  "XYZ" in data["text"]:
            filename = 'abc.json'
        try:
            outfile = open(filename,'ab')
        except:
            outfile = open(filename,'wb')
        obj_json={}
        obj_json["ID"] = iden
        obj_json["VAL_A"] = a
        obj_json["VAL_B"] = b
    

    And I am getting an error, the traceback is:

      File "rtfav.py", line 3, in <module>
        data = json.load(infile)
      File "/usr/lib64/python2.7/json/__init__.py", line 278, in load
        **kw)
      File "/usr/lib64/python2.7/json/__init__.py", line 326, in loads
        return _default_decoder.decode(s)
      File "/usr/lib64/python2.7/json/decoder.py", line 369, in decode
        raise ValueError(errmsg("Extra data", s, end, len(s)))
    ValueError: Extra data: line 88 column 2 - line 50607 column 2 (char 3077 - 1868399)
    

    Here is a sample of the data in new.json, there are about 1500 more such dictionaries in the file

    {
        "contributors": null, 
        "truncated": false, 
        "text": "@HomeShop18 #DreamJob to professional rafter", 
        "in_reply_to_status_id": null, 
        "id": 421584490452893696, 
        "favorite_count": 0, 
        "source": "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Mobile Web (M2)</a>", 
        "retweeted": false, 
        "coordinates": null, 
        "entities": {
            "symbols": [], 
            "user_mentions": [
                {
                    "id": 183093247, 
                    "indices": [
                        0, 
                        11
                    ], 
                    "id_str": "183093247", 
                    "screen_name": "HomeShop18", 
                    "name": "HomeShop18"
                }
            ], 
            "hashtags": [
                {
                    "indices": [
                        12, 
                        21
                    ], 
                    "text": "DreamJob"
                }
            ], 
            "urls": []
        }, 
        "in_reply_to_screen_name": "HomeShop18", 
        "id_str": "421584490452893696", 
        "retweet_count": 0, 
        "in_reply_to_user_id": 183093247, 
        "favorited": false, 
        "user": {
            "follow_request_sent": null, 
            "profile_use_background_image": true, 
            "default_profile_image": false, 
            "id": 2254546045, 
            "verified": false, 
            "profile_image_url_https": "https://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
            "profile_sidebar_fill_color": "171106", 
            "profile_text_color": "8A7302", 
            "followers_count": 87, 
            "profile_sidebar_border_color": "BCB302", 
            "id_str": "2254546045", 
            "profile_background_color": "0F0A02", 
            "listed_count": 1, 
            "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", 
            "utc_offset": null, 
            "statuses_count": 9793, 
            "description": "Rafter. Rafting is what I do. Me aur mera Tablet.  Technocrat of Future", 
            "friends_count": 231, 
            "location": "", 
            "profile_link_color": "473623", 
            "profile_image_url": "http://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
            "following": null, 
            "geo_enabled": false, 
            "profile_banner_url": "https://pbs.twimg.com/profile_banners/2254546045/1388065343", 
            "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", 
            "name": "Jayy", 
            "lang": "en", 
            "profile_background_tile": false, 
            "favourites_count": 41, 
            "screen_name": "JzayyPsingh", 
            "notifications": null, 
            "url": null, 
            "created_at": "Fri Dec 20 05:46:00 +0000 2013", 
            "contributors_enabled": false, 
            "time_zone": null, 
            "protected": false, 
            "default_profile": false, 
            "is_translator": false
        }, 
        "geo": null, 
        "in_reply_to_user_id_str": "183093247", 
        "lang": "en", 
        "created_at": "Fri Jan 10 10:09:09 +0000 2014", 
        "filter_level": "medium", 
        "in_reply_to_status_id_str": null, 
        "place": null
    } 
    
    • smci
      smci over 4 years
      This is the error you get whenever the input JSON has more than one object per line. Many of the answer here assume there is only one object per line, or construct examples obeying that, but would break if that wasn't the case.
    • aspiring1
      aspiring1 about 4 years
      @smci : Can you explain the line more than one object per line
  • Apoorv Ashutosh
    Apoorv Ashutosh over 10 years
    Can you please explain again with reference to the code I gave above? I am a newbie, and at times take long to grasp such things.
  • falsetru
    falsetru over 10 years
    @ApoorvAshutosh, It seems like new.json contains a json and another redundant data. json.load, json.loads can only decode a json. It raise a ValueError when it encounter addtional data as you see.
  • Apoorv Ashutosh
    Apoorv Ashutosh over 10 years
    Have pasted a sample from new.json, and I am filtering out some data from it, so I don't get where I am getting extra data from
  • falsetru
    falsetru over 10 years
    @ApoorvAshutosh, You said 1500 more such dictionaries in the edited question. That's the additional data. If you're the one who made a new.json, just put a single json in a file.
  • falsetru
    falsetru over 10 years
    @ApoorvAshutosh, If you need to dump multiple dictionaries as json, wrap them in a list, and dump the list.
  • Apoorv Ashutosh
    Apoorv Ashutosh over 10 years
    the issue here is not about loading into a JSON file, that has already happened. Can you tell me how to retrieve data from there? I already have a file that has dictionaries in it. I now have to retrieve each of those dictionaries. stackoverflow.com/questions/21059466/python-json-parser
  • falsetru
    falsetru over 10 years
    @ApoorvAshutosh, BTW, trailing ',' is missing in the json (in the new question). (at the line "x": []) => invalid json.
  • Apoorv Ashutosh
    Apoorv Ashutosh over 10 years
    sure, asap. And could you just look into one more thing, as I said, about how to read from a file with multiple dictionaries
  • falsetru
    falsetru over 10 years
    @ApoorvAshutosh, I'm doing research that issue. I will post answer there if research is done.
  • Apoorv Ashutosh
    Apoorv Ashutosh over 10 years
    Thats just a sample, I mentioned it in a comment
  • falsetru
    falsetru over 10 years
    @ApoorvAshutosh, Please post a valid sample!
  • falsetru
    falsetru over 10 years
    @ApoorvAshutosh, No, I mean the sample in the new question.
  • Apoorv Ashutosh
    Apoorv Ashutosh over 10 years
    Its for this very sample, the structure of the dictionaries is basically the same. However, I'll edit that question with this very sample
  • falsetru
    falsetru over 10 years
    @ApoorvAshutosh, I posted an answer that workaround the issue. Check it out.
  • Ben
    Ben over 8 years
    Is there a way to get json.loads to read newline-delimited json chunks? That is, to act like [json.loads(x) for x in text.split('\n')]? Related: Is there a guarantee that json.dumps will not include literal newlines in its output with default indenting?
  • jchook
    jchook over 7 years
    @Ben, by default json.dumps will change newlines in text content to "\n", keeping your json to a single line.
  • Aaron Liu
    Aaron Liu over 7 years
    Can I ask that why it still works when I use json.dump instead of json.dumps? I am using Python 3.5.2
  • falsetru
    falsetru over 7 years
    @ShuruiLiu, Please post a separated question.
  • charlesreid1
    charlesreid1 about 7 years
    The accepted answer addresses how to fix the source of the problem if you control the process of exporting, but if you are using someone else's data and you just have to deal with it, this is a great low-overhead method.
  • Fallenreaper
    Fallenreaper almost 7 years
    as someone who has an issue such as this from a json web scrape. I ran the code through a linter to see if it is valid json. It seems that it is, so why would this error still call?
  • Gabrer
    Gabrer over 6 years
    Many datasets (e.g.: Yelp dataset) nowadays are provided as "set" of Json objects and your approach it's convenient to load them.
  • Akbar Noto
    Akbar Noto almost 5 years
    loading just like yours, json.load(infile)
  • smci
    smci over 4 years
    This is not a general solution, it assumes the input has one JSON object per line, and breaks it it doesn't.
  • smci
    smci over 4 years
    This is not a general solution, it assumes the input has one JSON object per line, and breaks it it doesn't.
  • Sander Heinsalu
    Sander Heinsalu over 3 years
    I still get json.decoder.JSONDecodeError: Extra data: line 1 column 954 (char 953) with this answer's code. My data file must have a different problem.
  • Manuel Lazo
    Manuel Lazo about 3 years
    I was trying with this option, but I saw another useful way to get all items : file.readlines() which returns a list of sentences.
  • Ashwin Balani
    Ashwin Balani almost 3 years
    This is perfect, in case of error we can modify the code with try...except as well!
  • Zoe stands with Ukraine
    Zoe stands with Ukraine over 2 years
    For the record, if this is the entire JSON file, an outer map is redundant. The root can be an array, which lets you simplify the second JSON to just be an array. No need for a useless key in a useless map if you're storing array data - just throw it in a root array
  • Akbar Noto
    Akbar Noto over 2 years
    @Zoe oh that's interesting, could you provide us some example?
  • Zoe stands with Ukraine
    Zoe stands with Ukraine over 2 years
    It's not exactly hard. Just wrap the two maps in an array: [{"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"}, {"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}]. Parsing is identical, access is obj[0], obj[1], ... (read: just like accessing a normal array), and the objects you get are identical. The one you have in your answer would require obj["datas"][0], so it's functionally identical