Convert Pandas DataFrame to JSON as element of larger data structure

13,377

Solution 1

The default function (supplied to json.dumps) gets called for all objects that can't be serialized by default. It can return any object that the default encoder can serialize, such as a dict.

df.to_json() returns a string. json.loads(df.to_json) returns a dict with keys which are strings. So if we set default=lambda df: json.loads(df.to_json()) then the DataFrame will get serialized as though it were a dict.

import json
import numpy as np
import pandas as pd

z = np.zeros(10)
df = pd.DataFrame(z)
result = {"name": "Simple Example",
          "data": df, }

jstr = json.dumps(result,
                   default=lambda df: json.loads(df.to_json()))
newresult = json.loads(jstr)
print(newresult)
# {u'data': {u'0': {u'0': 0.0,
#    u'1': 0.0,
#    u'2': 0.0,
#    u'3': 0.0,
#    u'4': 0.0,
#    u'5': 0.0,
#    u'6': 0.0,
#    u'7': 0.0,
#    u'8': 0.0,
#    u'9': 0.0}},
#  u'name': u'Simple Example'}


print(pd.DataFrame(newresult['data']))

yields

   0
0  0
1  0
2  0
3  0
4  0
5  0
6  0
7  0
8  0
9  0

Solution 2

I think a bit more reading on the jsonext docs was warranted. It looks like I can create my own mixin that knows how to properly encode my DataFrame objects, then call jsonext.dumps(result). I was seduced by the existing to_dict() and to_json() methods of DataFrame objects, which don't really solve the problem.

Share:
13,377

Related videos on Youtube

smontanaro
Author by

smontanaro

A Python guy.

Updated on October 10, 2022

Comments

  • smontanaro
    smontanaro over 1 year

    I've been working with pandas DataFrame objects in my server, converting them to CSV for transmission to the browser, where the tabular values are plotted using d3. While CSV is file as far as it goes, I really need more than just a 2D table of data. If nothing else, I'd like to return some metadata about the data.

    So I started messing around with JSON thinking I would be able to construct a dictionary with some meta information and my DataFrame. For example, just as an absurdly simple example:

    >>> z = numpy.zeros(10)
    >>> df = pandas.DataFrame(z)
    >>> df
       0
    0  0
    1  0
    2  0
    3  0
    4  0
    5  0
    6  0
    7  0
    8  0
    9  0
    >>> result = {
    ...   "name": "Simple Example",
    ...   "data": df,
    ... }
    

    Not surprisingly, that can't be directly serialized using the json module. I found the jsonext module and tried it. It "works", but produces incomplete results:

    >>> jsonext.dumps(result)
    '{"data": ["0"], "name": "Simple Example"}'
    

    Looking at the methods DataFrame itself provides for this sort of thing, I found to_dict() and to_json(). The former produces dictionaries of dictionaries:

    >>> df.to_dict()
    {0: {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0}}
    

    but as you can see, they can't be serialized to JSON, since the keys are not strings.

    df.to_json() looked like it might work, though I would then wind up with a JSON string embedded in aanother JSON string. Something like this:

    json.dumps({"name": "Simple Example", "data": df.to_json()}) '{"data": "{\"0\":{\"0\":0.0,\"1\":0.0,\"2\":0.0,\"3\":0.0,\"4\":0.0,\"5\":0.0,\"6\":0.0,\"7\":0.0,\"8\":0.0,\"9\":0.0}}", "name": "Simple Example"}'

    In other words, a bit of a mess.

    Any suggestions about how to handle this sort of nested structure where some of the elements can't be directly serialized? I think I might be able to get jsonext to work, but its Dict mixin expects to find a proper (in its mind) to_dict() method. DataFrame.to_dict() doesn't seem to return the right thing. (Though I will continue to horse around with it.)

    I figured this must be a cat which has already been skinned. I just haven't found it. I'd be happy for now with nothing more hierarchical than something like my example (though with more key/value pairs), though I won't turn my nose up at a more general solution.

  • flying sheep
    flying sheep about 6 years
    you could also do just json.dumps(df, default=DataFrame.to_dict)
  • unutbu
    unutbu about 6 years
    @flyingsheep: Thank you for the suggestion. At first it looked like a great improvement, but, unfortunately, if df has, say, a DatetimeIndex, then the default=lambda df: json.loads(df.to_json() rigmarole works but default=pd.DataFrame.to_dict will raise TypeError: keys must be a string.