How to read a json-dictionary type file with pandas?

39,520

Solution 1

The json method doesnt work as the json file is not in the format it expects. As we can easily load a json as a dict, let's try this way :

import pandas as pd
import json
import os

os.chdir('/Users/nicolas/Downloads')

# Reading the json as a dict
with open('json_example.json') as json_data:
    data = json.load(json_data)

# using the from_dict load function. Note that the 'orient' parameter 
#is not using the default value (or it will give the same error that you got before)
# We transpose the resulting df and set index column as its index to get this result
pd.DataFrame.from_dict(data, orient='index').T.set_index('index')   

output:

                                                                 data columns
index                                                                        
311210177061863424  [25-34\n, FEMALE, @bikewa absolutely the best....     age
310912785183813632  [25-34\n, FEMALE, Photo: I love the Burke-Gilm...  gender
311290293871849472  [25-34\n, FEMALE, Photo: Inhaled! #fitfoodie h...    text
309386414548717569  [25-34\n, FEMALE, Facebook Is Making The Most ...    None
312327801187495936  [25-34\n, FEMALE, Still upset about this >&...    None
312249421079400449  [25-34\n, FEMALE, @JoeM_PM_UK @JonAntoine I've...    None
308692673194246145  [25-34\n, FEMALE, @Social_Freedom_ actually, t...    None
308995226633129984  [25-34\n, FEMALE, @seattleweekly that's more t...    None
308660851219501056  [25-34\n, FEMALE, @adamholdenbache I noticed 1...    None
308658690528014337  [25-34\n, FEMALE, @CEM_Social I am waiting pat...    None
309719798001070080  [25-34\n, FEMALE, Going to be watching Faceboo...    None
312349448049152002  [25-34\n, FEMALE, @anikamarketer I applied for...    None
312325152698404864  [25-34\n, FEMALE, @_chrisrojas_ wow, that's so...    None
310546490844135425  [25-34\n, FEMALE, Photo: Feeling like a bit of...    None

Solution 2

the pandas module and not the json module should be the answer: pandas itself has read_json capabilities and the root of the problem must be that you did not read the json in the correct orientation. you must pass the exact orient parameter with which you created the json variable in the first place

ex.:

df_json = globals()['df'].to_json(orient='split')

and then:

read_to_json = pd.read_json(df_json, orient='split')
Share:
39,520

Related videos on Youtube

skwoi
Author by

skwoi

Updated on July 09, 2022

Comments

  • skwoi
    skwoi almost 2 years

    I have a long json like this: http://pastebin.com/gzhHEYGy

    I would like to place it into a pandas datframe in order to play with it, so by the documentation I do the following:

    df = pd.read_json('/user/file.json')
    print df
    

    I got this traceback:

      File "/Users/user/PycharmProjects/PAN-pruebas/json_2_dataframe.py", line 6, in <module>
        df = pd.read_json('/Users/user/Downloads/54db3923f033e1dd6a82222aa2604ab9.json')
      File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 198, in read_json
        date_unit).parse()
      File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 266, in parse
        self._parse_no_numpy()
      File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 483, in _parse_no_numpy
        loads(json, precise_float=self.precise_float), dtype=None)
      File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 203, in __init__
        mgr = self._init_dict(data, index, columns, dtype=dtype)
      File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 327, in _init_dict
        dtype=dtype)
      File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4620, in _arrays_to_mgr
        index = extract_index(arrays)
      File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4668, in extract_index
        raise ValueError('arrays must all be same length')
    ValueError: arrays must all be same length
    

    Then from a previous question I found that I need to do something like this:

    d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
    

    But I dont get how should I obtain the contents like a numpy array. How can I preserve the length of the arrays in a big file like this?. Thanks in advance.

    • skwoi
      skwoi over 9 years
      It seems is like a dictionary.
  • skwoi
    skwoi over 9 years
    Thank you very much for the help @knightofni
  • laviex
    laviex over 4 years
    If anyone gets this error: the JSON object must be str, bytes or bytearray, not TextIOWrapper, note that it uses json.load() NOT json.loads(), which needs to be json.loads(json_data.read())