Pandas and JSON ValueError: arrays must all be same length

10,814

You have different lengths if rows so your original code will fail.

Try this:

import json
from pandas.io.json import json_normalize

with open('Lyrics_SteelyDan.json') as json_data:
    data = json.load(json_data)

df = pd.DataFrame(data['songs'])
df['lyrics']

Read also this: https://hackersandslackers.com/json-into-pandas-dataframes/

Share:
10,814

Related videos on Youtube

RWalling21
Author by

RWalling21

Updated on June 04, 2022

Comments

  • RWalling21
    RWalling21 almost 2 years

    I'm trying to make a simple application that will take lyrics from a song and save them, I'm using lyricsgenius to create a JSON file with the lyrics of the songs I'm requesting, however, I can't figure out how to parse the data from the JSON file. I've tried following this tutorial but I am getting an error when I start working with Pandas.

    Code to create the JSON File

    import lyricsgenius as genius
    import os
    
    os.getcwd()
    
    geniusCreds = "qlDFcHWqCRpSfq0pVTctt1ZhDc4wHF6lpP5WGODh4iVQB7yTPn7Hw6SjWAFiCdxa"
    artist_name = "Steely Dan"
    
    api = genius.Genius(geniusCreds)
    artist = api.search_artist(artist_name, max_songs=3)
    
    artist.save_lyrics()
    

    Code to read the Data from the JSON File

    import pandas as pd
    import os
    
    
    Artist = pd.read_json("Lyrics_SteelyDan.json")
    
    df = pd.DataFrame.from_dict(Artist['songs'])
    
    df.head
    

    Whenever I run the code above I get the error, any help on how to fix the error or a better way to parse the data would be much appreciated, thank you.

     "c:/Users/Admin/Desktop/Steely Dan/Data.py"
    Traceback (most recent call last):
      File "c:/Users/Admin/Desktop/Steely Dan/Data.py", line 5, in <module>
        Artist = pd.read_json("Lyrics_SteelyDan.json")
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\json\_json.py", line 592, in read_json
        result = json_reader.read()
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\json\_json.py", line 717, in read
        obj = self._get_object_parser(self.data)
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\json\_json.py", line 739, in _get_object_parser
        obj = FrameParser(json, **kwargs).parse()
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\json\_json.py", line 849, in parse
        self._parse_no_numpy()
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\json\_json.py", line 1093, in _parse_no_numpy
        loads(json, precise_float=self.precise_float), dtype=None
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 411, in __init__
        mgr = init_dict(data, index, columns, dtype=dtype)
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals\construction.py", line 257, in init_dict
        return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals\construction.py", line 77, in arrays_to_mgr
        index = extract_index(arrays)
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals\construction.py", line 368, in extract_index
        raise ValueError("arrays must all be same length")
    ValueError: arrays must all be same length
    
    • It_is_Chris
      It_is_Chris over 4 years
      please paste the full traceback.
    • It_is_Chris
      It_is_Chris over 4 years
      Can you also post the json?
    • It_is_Chris
      It_is_Chris over 4 years
      If you have a github can you post it there and link to it or provide a sample / portion of the json file.
    • It_is_Chris
      It_is_Chris over 4 years
      Sorry, that repo (JSON-Snip) is returning a 404
    • It_is_Chris
      It_is_Chris over 4 years
      Same 404; is it a public repo?
  • It_is_Chris
    It_is_Chris over 4 years
    change df = json_normalize(data) to df = pd.DataFrame(data['songs']) then call the lyrics columns df['lyrics']
  • brainstorm
    brainstorm over 2 years
    json_normalize() is deprecated...
  • Gary Carlyle Cook
    Gary Carlyle Cook almost 2 years
    Also they are spelling it with an S instead of a Z.