Python json.loads shows ValueError: Extra data
Solution 1
As you can see in the following example, json.loads
(and json.load
) does not decode multiple json object.
>>> json.loads('{}')
{}
>>> json.loads('{}{}') # == json.loads(json.dumps({}) + json.dumps({}))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\json\__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 368, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 3 - line 1 column 5 (char 2 - 4)
If you want to dump multiple dictionaries, wrap them in a list, dump the list (instead of dumping dictionaries multiple times)
>>> dict1 = {}
>>> dict2 = {}
>>> json.dumps([dict1, dict2])
'[{}, {}]'
>>> json.loads(json.dumps([dict1, dict2]))
[{}, {}]
Solution 2
You can just read from a file, jsonifying
each line as you go:
tweets = []
for line in open('tweets.json', 'r'):
tweets.append(json.loads(line))
This avoids storing intermediate python objects. As long as your write one full tweet per append()
call, this should work.
Solution 3
I came across this because I was trying to load a JSON file dumped from MongoDB. It was giving me an error
JSONDecodeError: Extra data: line 2 column 1
The MongoDB JSON dump has one object per line, so what worked for me is:
import json
data = [json.loads(line) for line in open('data.json', 'r')]
Solution 4
This may also happen if your JSON file is not just 1 JSON record. A JSON record looks like this:
[{"some data": value, "next key": "another value"}]
It opens and closes with a bracket [ ], within the brackets are the braces { }. There can be many pairs of braces, but it all ends with a close bracket ]. If your json file contains more than one of those:
[{"some data": value, "next key": "another value"}]
[{"2nd record data": value, "2nd record key": "another value"}]
then loads() will fail.
I verified this with my own file that was failing.
import json
guestFile = open("1_guests.json",'r')
guestData = guestFile.read()
guestFile.close()
gdfJson = json.loads(guestData)
This works because 1_guests.json has one record []. The original file I was using all_guests.json had 6 records separated by newline. I deleted 5 records, (which I already checked to be bookended by brackets) and saved the file under a new name. Then the loads statement worked.
Error was
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 10 column 1 (char 261900 - 6964758)
PS. I use the word record, but that's not the official name. Also, if your file has newline characters like mine, you can loop through it to loads() one record at a time into a json variable.
Solution 5
I just got the same error while my json file is like this
{"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"}
{"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}
And I found it malformed, so I changed it to:
{
"datas":[
{"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"},
{"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}
]
}
Related videos on Youtube
Apoorv Ashutosh
inactive account. was a computer science student. no longer a developer
Updated on November 09, 2021Comments
-
Apoorv Ashutosh about 1 year
I am getting some data from a JSON file "new.json", and I want to filter some data and store it into a new JSON file. Here is my code:
import json with open('new.json') as infile: data = json.load(infile) for item in data: iden = item.get["id"] a = item.get["a"] b = item.get["b"] c = item.get["c"] if c == 'XYZ' or "XYZ" in data["text"]: filename = 'abc.json' try: outfile = open(filename,'ab') except: outfile = open(filename,'wb') obj_json={} obj_json["ID"] = iden obj_json["VAL_A"] = a obj_json["VAL_B"] = b
And I am getting an error, the traceback is:
File "rtfav.py", line 3, in <module> data = json.load(infile) File "/usr/lib64/python2.7/json/__init__.py", line 278, in load **kw) File "/usr/lib64/python2.7/json/__init__.py", line 326, in loads return _default_decoder.decode(s) File "/usr/lib64/python2.7/json/decoder.py", line 369, in decode raise ValueError(errmsg("Extra data", s, end, len(s))) ValueError: Extra data: line 88 column 2 - line 50607 column 2 (char 3077 - 1868399)
Here is a sample of the data in new.json, there are about 1500 more such dictionaries in the file
{ "contributors": null, "truncated": false, "text": "@HomeShop18 #DreamJob to professional rafter", "in_reply_to_status_id": null, "id": 421584490452893696, "favorite_count": 0, "source": "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Mobile Web (M2)</a>", "retweeted": false, "coordinates": null, "entities": { "symbols": [], "user_mentions": [ { "id": 183093247, "indices": [ 0, 11 ], "id_str": "183093247", "screen_name": "HomeShop18", "name": "HomeShop18" } ], "hashtags": [ { "indices": [ 12, 21 ], "text": "DreamJob" } ], "urls": [] }, "in_reply_to_screen_name": "HomeShop18", "id_str": "421584490452893696", "retweet_count": 0, "in_reply_to_user_id": 183093247, "favorited": false, "user": { "follow_request_sent": null, "profile_use_background_image": true, "default_profile_image": false, "id": 2254546045, "verified": false, "profile_image_url_https": "https://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", "profile_sidebar_fill_color": "171106", "profile_text_color": "8A7302", "followers_count": 87, "profile_sidebar_border_color": "BCB302", "id_str": "2254546045", "profile_background_color": "0F0A02", "listed_count": 1, "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "utc_offset": null, "statuses_count": 9793, "description": "Rafter. Rafting is what I do. Me aur mera Tablet. Technocrat of Future", "friends_count": 231, "location": "", "profile_link_color": "473623", "profile_image_url": "http://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", "following": null, "geo_enabled": false, "profile_banner_url": "https://pbs.twimg.com/profile_banners/2254546045/1388065343", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "name": "Jayy", "lang": "en", "profile_background_tile": false, "favourites_count": 41, "screen_name": "JzayyPsingh", "notifications": null, "url": null, "created_at": "Fri Dec 20 05:46:00 +0000 2013", "contributors_enabled": false, "time_zone": null, "protected": false, "default_profile": false, "is_translator": false }, "geo": null, "in_reply_to_user_id_str": "183093247", "lang": "en", "created_at": "Fri Jan 10 10:09:09 +0000 2014", "filter_level": "medium", "in_reply_to_status_id_str": null, "place": null }
-
smci almost 3 yearsThis is the error you get whenever the input JSON has more than one object per line. Many of the answer here assume there is only one object per line, or construct examples obeying that, but would break if that wasn't the case.
-
aspiring1 almost 3 years@smci : Can you explain the line
more than one object per line
-
-
Apoorv Ashutosh almost 9 yearsCan you please explain again with reference to the code I gave above? I am a newbie, and at times take long to grasp such things.
-
falsetru almost 9 years@ApoorvAshutosh, It seems like
new.json
contains a json and another redundant data.json.load
,json.loads
can only decode a json. It raise aValueError
when it encounter addtional data as you see. -
Apoorv Ashutosh almost 9 yearsHave pasted a sample from new.json, and I am filtering out some data from it, so I don't get where I am getting extra data from
-
falsetru almost 9 years@ApoorvAshutosh, You said 1500 more such dictionaries in the edited question. That's the additional data. If you're the one who made a
new.json
, just put a single json in a file. -
falsetru almost 9 years@ApoorvAshutosh, If you need to dump multiple dictionaries as json, wrap them in a list, and dump the list.
-
Apoorv Ashutosh almost 9 yearsthe issue here is not about loading into a JSON file, that has already happened. Can you tell me how to retrieve data from there? I already have a file that has dictionaries in it. I now have to retrieve each of those dictionaries. stackoverflow.com/questions/21059466/python-json-parser
-
falsetru almost 9 years@ApoorvAshutosh, BTW, trailing ',' is missing in the json (in the new question). (at the line
"x": []
) => invalid json. -
Apoorv Ashutosh almost 9 yearssure, asap. And could you just look into one more thing, as I said, about how to read from a file with multiple dictionaries
-
falsetru almost 9 years@ApoorvAshutosh, I'm doing research that issue. I will post answer there if research is done.
-
Apoorv Ashutosh almost 9 yearsThats just a sample, I mentioned it in a comment
-
falsetru almost 9 years@ApoorvAshutosh, Please post a valid sample!
-
falsetru almost 9 years@ApoorvAshutosh, No, I mean the sample in the new question.
-
Apoorv Ashutosh almost 9 yearsIts for this very sample, the structure of the dictionaries is basically the same. However, I'll edit that question with this very sample
-
falsetru almost 9 years@ApoorvAshutosh, I posted an answer that workaround the issue. Check it out.
-
Ben almost 7 yearsIs there a way to get
json.loads
to read newline-delimited json chunks? That is, to act like[json.loads(x) for x in text.split('\n')]
? Related: Is there a guarantee thatjson.dumps
will not include literal newlines in its output with default indenting? -
jchook over 6 years@Ben, by default
json.dumps
will change newlines in text content to"\n"
, keeping your json to a single line. -
Aaron Liu over 6 yearsCan I ask that why it still works when I use
json.dump
instead ofjson.dumps
? I am using Python 3.5.2 -
falsetru about 6 years@ShuruiLiu, Please post a separated question.
-
charlesreid1 almost 6 yearsThe accepted answer addresses how to fix the source of the problem if you control the process of exporting, but if you are using someone else's data and you just have to deal with it, this is a great low-overhead method.
-
Fallenreaper over 5 yearsas someone who has an issue such as this from a json web scrape. I ran the code through a linter to see if it is valid json. It seems that it is, so why would this error still call?
-
Gabrer almost 5 yearsMany datasets (e.g.: Yelp dataset) nowadays are provided as "set" of Json objects and your approach it's convenient to load them.
-
Akbar Noto over 3 yearsloading just like yours, json.load(infile)
-
smci almost 3 yearsThis is not a general solution, it assumes the input has one JSON object per line, and breaks it it doesn't.
-
smci almost 3 yearsThis is not a general solution, it assumes the input has one JSON object per line, and breaks it it doesn't.
-
Sander Heinsalu almost 2 yearsI still get
json.decoder.JSONDecodeError: Extra data: line 1 column 954 (char 953)
with this answer's code. My data file must have a different problem. -
Manuel Lazo almost 2 yearsI was trying with this option, but I saw another useful way to get all items :
file.readlines()
which returns a list of sentences. -
Ashwin Balani over 1 yearThis is perfect, in case of error we can modify the code with try...except as well!
-
Zoe stands with Ukraine over 1 yearFor the record, if this is the entire JSON file, an outer map is redundant. The root can be an array, which lets you simplify the second JSON to just be an array. No need for a useless key in a useless map if you're storing array data - just throw it in a root array
-
Akbar Noto over 1 year@Zoe oh that's interesting, could you provide us some example?
-
Zoe stands with Ukraine over 1 yearIt's not exactly hard. Just wrap the two maps in an array:
[{"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"}, {"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}]
. Parsing is identical, access isobj[0]
,obj[1]
, ... (read: just like accessing a normal array), and the objects you get are identical. The one you have in your answer would requireobj["datas"][0]
, so it's functionally identical