Python json.loads shows ValueError: Extra data

python json

441,448

Solution 1

As you can see in the following example, json.loads (and json.load) does not decode multiple json object.

>>> json.loads('{}')
{}
>>> json.loads('{}{}') # == json.loads(json.dumps({}) + json.dumps({}))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 368, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 3 - line 1 column 5 (char 2 - 4)

If you want to dump multiple dictionaries, wrap them in a list, dump the list (instead of dumping dictionaries multiple times)

>>> dict1 = {}
>>> dict2 = {}
>>> json.dumps([dict1, dict2])
'[{}, {}]'
>>> json.loads(json.dumps([dict1, dict2]))
[{}, {}]

Solution 2

You can just read from a file, jsonifying each line as you go:

tweets = []
for line in open('tweets.json', 'r'):
    tweets.append(json.loads(line))

This avoids storing intermediate python objects. As long as your write one full tweet per append() call, this should work.

Solution 3

I came across this because I was trying to load a JSON file dumped from MongoDB. It was giving me an error

JSONDecodeError: Extra data: line 2 column 1

The MongoDB JSON dump has one object per line, so what worked for me is:

import json
data = [json.loads(line) for line in open('data.json', 'r')]

Solution 4

This may also happen if your JSON file is not just 1 JSON record. A JSON record looks like this:

[{"some data": value, "next key": "another value"}]

It opens and closes with a bracket [ ], within the brackets are the braces { }. There can be many pairs of braces, but it all ends with a close bracket ]. If your json file contains more than one of those:

[{"some data": value, "next key": "another value"}]
[{"2nd record data": value, "2nd record key": "another value"}]

then loads() will fail.

I verified this with my own file that was failing.

import json
guestFile = open("1_guests.json",'r')
guestData = guestFile.read()
guestFile.close()
gdfJson = json.loads(guestData)

This works because 1_guests.json has one record []. The original file I was using all_guests.json had 6 records separated by newline. I deleted 5 records, (which I already checked to be bookended by brackets) and saved the file under a new name. Then the loads statement worked.

Error was

   raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 10 column 1 (char 261900 - 6964758)

PS. I use the word record, but that's not the official name. Also, if your file has newline characters like mine, you can loop through it to loads() one record at a time into a json variable.

Solution 5

I just got the same error while my json file is like this

{"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"}
{"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}

And I found it malformed, so I changed it to:

{
  "datas":[
    {"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"},
    {"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}
  ]
}

View more solutions

441,448

Apoorv Ashutosh

inactive account. was a computer science student. no longer a developer

Updated on November 09, 2021

Comments

Apoorv Ashutosh about 1 year

I am getting some data from a JSON file "new.json", and I want to filter some data and store it into a new JSON file. Here is my code:

import json
with open('new.json') as infile:
    data = json.load(infile)
for item in data:
    iden = item.get["id"]
    a = item.get["a"]
    b = item.get["b"]
    c = item.get["c"]
    if c == 'XYZ' or  "XYZ" in data["text"]:
        filename = 'abc.json'
    try:
        outfile = open(filename,'ab')
    except:
        outfile = open(filename,'wb')
    obj_json={}
    obj_json["ID"] = iden
    obj_json["VAL_A"] = a
    obj_json["VAL_B"] = b

And I am getting an error, the traceback is:

  File "rtfav.py", line 3, in <module>
    data = json.load(infile)
  File "/usr/lib64/python2.7/json/__init__.py", line 278, in load
    **kw)
  File "/usr/lib64/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 369, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 88 column 2 - line 50607 column 2 (char 3077 - 1868399)

Here is a sample of the data in new.json, there are about 1500 more such dictionaries in the file

{
    "contributors": null, 
    "truncated": false, 
    "text": "@HomeShop18 #DreamJob to professional rafter", 
    "in_reply_to_status_id": null, 
    "id": 421584490452893696, 
    "favorite_count": 0, 
    "source": "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Mobile Web (M2)</a>", 
    "retweeted": false, 
    "coordinates": null, 
    "entities": {
        "symbols": [], 
        "user_mentions": [
            {
                "id": 183093247, 
                "indices": [
                    0, 
                    11
                ], 
                "id_str": "183093247", 
                "screen_name": "HomeShop18", 
                "name": "HomeShop18"
            }
        ], 
        "hashtags": [
            {
                "indices": [
                    12, 
                    21
                ], 
                "text": "DreamJob"
            }
        ], 
        "urls": []
    }, 
    "in_reply_to_screen_name": "HomeShop18", 
    "id_str": "421584490452893696", 
    "retweet_count": 0, 
    "in_reply_to_user_id": 183093247, 
    "favorited": false, 
    "user": {
        "follow_request_sent": null, 
        "profile_use_background_image": true, 
        "default_profile_image": false, 
        "id": 2254546045, 
        "verified": false, 
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
        "profile_sidebar_fill_color": "171106", 
        "profile_text_color": "8A7302", 
        "followers_count": 87, 
        "profile_sidebar_border_color": "BCB302", 
        "id_str": "2254546045", 
        "profile_background_color": "0F0A02", 
        "listed_count": 1, 
        "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", 
        "utc_offset": null, 
        "statuses_count": 9793, 
        "description": "Rafter. Rafting is what I do. Me aur mera Tablet.  Technocrat of Future", 
        "friends_count": 231, 
        "location": "", 
        "profile_link_color": "473623", 
        "profile_image_url": "http://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
        "following": null, 
        "geo_enabled": false, 
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/2254546045/1388065343", 
        "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", 
        "name": "Jayy", 
        "lang": "en", 
        "profile_background_tile": false, 
        "favourites_count": 41, 
        "screen_name": "JzayyPsingh", 
        "notifications": null, 
        "url": null, 
        "created_at": "Fri Dec 20 05:46:00 +0000 2013", 
        "contributors_enabled": false, 
        "time_zone": null, 
        "protected": false, 
        "default_profile": false, 
        "is_translator": false
    }, 
    "geo": null, 
    "in_reply_to_user_id_str": "183093247", 
    "lang": "en", 
    "created_at": "Fri Jan 10 10:09:09 +0000 2014", 
    "filter_level": "medium", 
    "in_reply_to_status_id_str": null, 
    "place": null
}

smci almost 3 years

This is the error you get whenever the input JSON has more than one object per line. Many of the answer here assume there is only one object per line, or construct examples obeying that, but would break if that wasn't the case.
aspiring1 almost 3 years

@smci : Can you explain the line more than one object per line

Apoorv Ashutosh almost 9 years

Can you please explain again with reference to the code I gave above? I am a newbie, and at times take long to grasp such things.
falsetru almost 9 years

@ApoorvAshutosh, It seems like new.json contains a json and another redundant data. json.load, json.loads can only decode a json. It raise a ValueError when it encounter addtional data as you see.
Apoorv Ashutosh almost 9 years

Have pasted a sample from new.json, and I am filtering out some data from it, so I don't get where I am getting extra data from
falsetru almost 9 years

@ApoorvAshutosh, You said 1500 more such dictionaries in the edited question. That's the additional data. If you're the one who made a new.json, just put a single json in a file.
falsetru almost 9 years

@ApoorvAshutosh, If you need to dump multiple dictionaries as json, wrap them in a list, and dump the list.
Apoorv Ashutosh almost 9 years

the issue here is not about loading into a JSON file, that has already happened. Can you tell me how to retrieve data from there? I already have a file that has dictionaries in it. I now have to retrieve each of those dictionaries. stackoverflow.com/questions/21059466/python-json-parser
falsetru almost 9 years

@ApoorvAshutosh, BTW, trailing ',' is missing in the json (in the new question). (at the line "x": []) => invalid json.
Apoorv Ashutosh almost 9 years

sure, asap. And could you just look into one more thing, as I said, about how to read from a file with multiple dictionaries
falsetru almost 9 years

@ApoorvAshutosh, I'm doing research that issue. I will post answer there if research is done.
Apoorv Ashutosh almost 9 years

Thats just a sample, I mentioned it in a comment
falsetru almost 9 years

@ApoorvAshutosh, Please post a valid sample!
falsetru almost 9 years

@ApoorvAshutosh, No, I mean the sample in the new question.
Apoorv Ashutosh almost 9 years

Its for this very sample, the structure of the dictionaries is basically the same. However, I'll edit that question with this very sample
falsetru almost 9 years

@ApoorvAshutosh, I posted an answer that workaround the issue. Check it out.
Ben almost 7 years

Is there a way to get json.loads to read newline-delimited json chunks? That is, to act like [json.loads(x) for x in text.split('\n')]? Related: Is there a guarantee that json.dumps will not include literal newlines in its output with default indenting?
jchook over 6 years

@Ben, by default json.dumps will change newlines in text content to "\n", keeping your json to a single line.
Aaron Liu over 6 years

Can I ask that why it still works when I use json.dump instead of json.dumps? I am using Python 3.5.2
falsetru about 6 years

@ShuruiLiu, Please post a separated question.
charlesreid1 almost 6 years

The accepted answer addresses how to fix the source of the problem if you control the process of exporting, but if you are using someone else's data and you just have to deal with it, this is a great low-overhead method.
Fallenreaper over 5 years

as someone who has an issue such as this from a json web scrape. I ran the code through a linter to see if it is valid json. It seems that it is, so why would this error still call?
Gabrer almost 5 years

Many datasets (e.g.: Yelp dataset) nowadays are provided as "set" of Json objects and your approach it's convenient to load them.
Akbar Noto over 3 years

loading just like yours, json.load(infile)
smci almost 3 years

This is not a general solution, it assumes the input has one JSON object per line, and breaks it it doesn't.
smci almost 3 years

This is not a general solution, it assumes the input has one JSON object per line, and breaks it it doesn't.
Sander Heinsalu almost 2 years

I still get json.decoder.JSONDecodeError: Extra data: line 1 column 954 (char 953) with this answer's code. My data file must have a different problem.
Manuel Lazo almost 2 years

I was trying with this option, but I saw another useful way to get all items : file.readlines() which returns a list of sentences.
Ashwin Balani over 1 year

This is perfect, in case of error we can modify the code with try...except as well!
Zoe stands with Ukraine over 1 year

For the record, if this is the entire JSON file, an outer map is redundant. The root can be an array, which lets you simplify the second JSON to just be an array. No need for a useless key in a useless map if you're storing array data - just throw it in a root array
Akbar Noto over 1 year

@Zoe oh that's interesting, could you provide us some example?
Zoe stands with Ukraine over 1 year

It's not exactly hard. Just wrap the two maps in an array: [{"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"}, {"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}]. Parsing is identical, access is obj[0], obj[1], ... (read: just like accessing a normal array), and the objects you get are identical. The one you have in your answer would require obj["datas"][0], so it's functionally identical