pandas.to_dict returns None mixed with nan

11,562

I think you can only replace, it is not possible control in to_dict:

L = [{
     "COL1":"VAL1"
     ,"COL2":np.nan
     ,"COL3":"VAL3"
 }
 ,{
     "COL1":None
     ,"COL2":"VAL2"
     ,"COL3":"VAL3"
 }
 ,{
     "COL1":"VAL1"
     ,"COL2":"VAL2"
     ,"COL3":np.nan
}]

df = pd.DataFrame(L).replace({np.nan:None})
print (df)
   COL1  COL2  COL3
0  VAL1  None  VAL3
1  None  VAL2  VAL3
2  VAL1  VAL2  None

print (df.to_dict(orient='records'))
[{'COL3': 'VAL3', 'COL2': None, 'COL1': 'VAL1'}, 
 {'COL3': 'VAL3', 'COL2': 'VAL2', 'COL1': None}, 
 {'COL3': None, 'COL2': 'VAL2', 'COL1': 'VAL1'}]
Share:
11,562

Related videos on Youtube

Piotr Kamoda
Author by

Piotr Kamoda

I'm currently working in Warsaw, mostly in various dbms and Informatica, trying to work my way around some ridiculous tasks.

Updated on June 04, 2022

Comments

  • Piotr Kamoda
    Piotr Kamoda almost 2 years

    I've stumbled upon a minor problem with pandas and it's method to_dict. I have a table that I'm certain have equal number of identical columns in each row, let's say it looks like that:

    +----|----|----+
    |COL1|COL2|COL3|
    +----|----|----+
    |VAL1|    |VAL3|
    |    |VAL2|VAL3|
    |VAL1|VAL2|    |
    +----|----|----+
    

    When I do df.to_dict(orient='records') I get:

    [{
         "COL1":"VAL1"
         ,"COL2":nan
         ,"COL3":"VAL3"
     }
     ,{
         "COL1":None
         ,"COL2":"VAL2"
         ,"COL3":"VAL3"
     }
     ,{
         "COL1":"VAL1"
         ,"COL2":"VAL2"
         ,"COL3":nan
    }]
    

    Notice nan's in some columns and None's in other (always the same, there appears to be no nan and None in same column)

    And when I do json.loads(df.to_json(orient='records')) i get only None and no nan's (which is desired output).

    Like this:

    [{
         "COL1":"VAL1"
         ,"COL2":None
         ,"COL3":"VAL3"
     }
     ,{
         "COL1":None
         ,"COL2":"VAL2"
         ,"COL3":"VAL3"
     }
     ,{
         "COL1":"VAL1"
         ,"COL2":"VAL2"
         ,"COL3":None
    }]
    

    I would appreciate some explanation as to why it happens and if it can be controlled in some way.

    ==EDIT==

    According to comments it would be better to first replace those nan's with None's, but those nan's are not np.nan:

    >>> a = df.head().ix[0,60]
    >>> a
    nan
    >>> type(a)
    <class 'numpy.float64'>
    >>> a is np.nan
    False
    >>> a == np.nan
    False
    
  • Piotr Kamoda
    Piotr Kamoda about 7 years
    Unfortunately this does not work :P this np.NaN is not equal to what I have in the table (which is strange)
  • jezrael
    jezrael about 7 years
    VALs are numeric or strings ?
  • Piotr Kamoda
    Piotr Kamoda about 7 years
    string - data is loaded from json
  • jezrael
    jezrael about 7 years
    Hmm, problem is in real data or with sample too? What is your version of python and pandas?
  • Piotr Kamoda
    Piotr Kamoda about 7 years
    pandas == '0.19.2' python == Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
  • jezrael
    jezrael about 7 years
    It is weird, I use same version of python and pandas nad for me it works. :(
  • jezrael
    jezrael about 7 years
    I see your edted question, need for check NaN something else print (pd.isnull(pd.Series(np.nan)))
  • Piotr Kamoda
    Piotr Kamoda about 7 years
    It outputs True
  • Piotr Kamoda
    Piotr Kamoda about 7 years
    Nope, unfortunately no. I even tried: >>> a = df.ix[0,60] >>> a nan >>> df.head().replace({a:None})
  • jezrael
    jezrael about 7 years
    and this a = df.replace({np.nan:None}).ix[0,60] ? What is a ?
  • Piotr Kamoda
    Piotr Kamoda about 7 years
    >>> a = df.replace({np.nan:None}).ix[0,60] >>> a nan >>> type(a) <class 'numpy.float64'>
  • jezrael
    jezrael about 7 years
    Hmmm, and what about print (df.to_json(orient='records')) ? - to_json
  • Piotr Kamoda
    Piotr Kamoda about 7 years
    >>> print (df.head().ix[:,60].to_json(orient='records')) [null,null,null,null,null] ¯\(°_o)/¯
  • jezrael
    jezrael about 7 years
    Yes, you are right. I find this - Note NaN‘s, NaT‘s and None will be converted to null and datetime objects will be converted based on the date_format and date_unit parameters.. So if replace dont work, use only your solution :(