JSON decoding string - Unterminated string

11,440

Solution 1

I discovered the good guys at Luminoso have written a Library to sort this kind of issue.

Apparently, sometimes you might have to deal with text that comes out of other code. where the text has often passed through several different pieces of software, each with their own quirks, probably with Microsoft Office somewhere in the chain.

This is where ftfy comes to the rescue.

from ftfy import fix_text
import json
# text = some text source with a potential unicode problem
fixed_text = fix_text(text)
data = json.loads(fixed_text)

Solution 2

I was having this problem with my data, and tried many of the things recommended online to no avail. Finally, I just read in the json lines to a dictionary, line by line, skipping any lines that raised an exception. Then, I loaded the dictionary into a DataFrame: no error.

In the code below, you can see that I actually read the lines into a dictionary of dictionaries (using enumerate to get a numeric key); this gives Pandas an index to use and avoids an error. I also had to transpose the df ('T') to get the data like I wanted.

This is with a json lines file, so the code below won't work for a regular json file, but I'm sure the same principle can be used.

I ended up losing about 20 lines in over 388K lines of data. This doesn't matter for me because my data is a sample anyway. If you actually need every line of your data, this isn't the ideal solution. But if you don't, it seems that the easiest way to deal with this problem is to just toss out the bad apples.

import pandas as pd
import json

filename = 'data.jl'  #json lines file

with open(filename) as f:
    lines = f.read().splitlines()

my_dict = {}
for i, line in enumerate(lines):
    try:
        my_dict[i] = json.loads(line)
    except:
        pass

df = pd.DataFrame.from_dict(my_dict).T
Share:
11,440
ocean800
Author by

ocean800

Updated on June 07, 2022

Comments

  • ocean800
    ocean800 almost 2 years

    I have a json file that is 2 GB, and when I try to load it I'm getting this error:

    json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 1093156512 (char 1093156511)

    So this means that there is probably some escape sequence, right?(or something like that..) that is messing up the json correct? The issue is that this file is huge, and just opening it in the editor is a huge pain. The editor 100% crashes before I can see what the issue is. However, I still need to fix this issue somehow.... I'm not sure what can be causing this issue.... it can be many things.

    my data is essentially a list of objects like so:

    data = [{key1: 123, key2:"this is the first string to concatenate"},
     {key1: 131, key2:"this is the second string to concatenate"},
     {key1: 152, key2:"this is the third string to concatenate"} ] 
    

    Except with more complicated key2's. If the issue was an \, if I got rid of all of \'s within the json file would it work? However, there is nothing to say that an odd escape character is my issue.... also, I have very little control about what my input json file is, so I dont think I would be able to change that anyway.

    Is there anyway to fix this issue without changing the input json file?

    [EDIT] This is the whole error trace:

    File "halp.py", line 38, in data = json.load(json_file,strict=False)

    File "/usr/lib/python3.6/json/init.py", line 299, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)

    File "/usr/lib/python3.6/json/init.py", line 367, in loads return cls(**kw).decode(s)

    File "/usr/lib/python3.6/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end())

    File "/usr/lib/python3.6/json/decoder.py", line 355, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 1093156512 (char 1093156511)

    When I seek there I get:

    eers in the fridge!"}, {"city_name": "Portland", "comments": "A cute space to rest your head in Portland. We just stayed for one night, briefly met Adam who was lovely! Appreciated the beers and coffe