pandas.to_dict returns None mixed with nan
I think you can only replace
, it is not possible control in to_dict
:
L = [{
"COL1":"VAL1"
,"COL2":np.nan
,"COL3":"VAL3"
}
,{
"COL1":None
,"COL2":"VAL2"
,"COL3":"VAL3"
}
,{
"COL1":"VAL1"
,"COL2":"VAL2"
,"COL3":np.nan
}]
df = pd.DataFrame(L).replace({np.nan:None})
print (df)
COL1 COL2 COL3
0 VAL1 None VAL3
1 None VAL2 VAL3
2 VAL1 VAL2 None
print (df.to_dict(orient='records'))
[{'COL3': 'VAL3', 'COL2': None, 'COL1': 'VAL1'},
{'COL3': 'VAL3', 'COL2': 'VAL2', 'COL1': None},
{'COL3': None, 'COL2': 'VAL2', 'COL1': 'VAL1'}]
Related videos on Youtube
Piotr Kamoda
I'm currently working in Warsaw, mostly in various dbms and Informatica, trying to work my way around some ridiculous tasks.
Updated on June 04, 2022Comments
-
Piotr Kamoda almost 2 years
I've stumbled upon a minor problem with pandas and it's method to_dict. I have a table that I'm certain have equal number of identical columns in each row, let's say it looks like that:
+----|----|----+ |COL1|COL2|COL3| +----|----|----+ |VAL1| |VAL3| | |VAL2|VAL3| |VAL1|VAL2| | +----|----|----+
When I do
df.to_dict(orient='records')
I get:[{ "COL1":"VAL1" ,"COL2":nan ,"COL3":"VAL3" } ,{ "COL1":None ,"COL2":"VAL2" ,"COL3":"VAL3" } ,{ "COL1":"VAL1" ,"COL2":"VAL2" ,"COL3":nan }]
Notice
nan
's in some columns andNone
's in other (always the same, there appears to be nonan
andNone
in same column)And when I do
json.loads(df.to_json(orient='records'))
i get onlyNone
and nonan
's (which is desired output).Like this:
[{ "COL1":"VAL1" ,"COL2":None ,"COL3":"VAL3" } ,{ "COL1":None ,"COL2":"VAL2" ,"COL3":"VAL3" } ,{ "COL1":"VAL1" ,"COL2":"VAL2" ,"COL3":None }]
I would appreciate some explanation as to why it happens and if it can be controlled in some way.
==EDIT==
According to comments it would be better to first replace those
nan
's withNone
's, but thosenan
's are notnp.nan
:>>> a = df.head().ix[0,60] >>> a nan >>> type(a) <class 'numpy.float64'> >>> a is np.nan False >>> a == np.nan False
-
Piotr Kamoda about 7 yearsUnfortunately this does not work :P this np.NaN is not equal to what I have in the table (which is strange)
-
jezrael about 7 years
VALs
are numeric orstrings
? -
Piotr Kamoda about 7 yearsstring - data is loaded from json
-
jezrael about 7 yearsHmm, problem is in real data or with sample too? What is your version of python and pandas?
-
Piotr Kamoda about 7 yearspandas == '0.19.2' python == Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
-
jezrael about 7 yearsIt is weird, I use same version of python and pandas nad for me it works. :(
-
jezrael about 7 yearsI see your edted question, need for check NaN something else
print (pd.isnull(pd.Series(np.nan)))
-
Piotr Kamoda about 7 yearsIt outputs True
-
Piotr Kamoda about 7 yearsNope, unfortunately no. I even tried:
>>> a = df.ix[0,60] >>> a nan >>> df.head().replace({a:None})
-
jezrael about 7 yearsand this
a = df.replace({np.nan:None}).ix[0,60]
? What isa
? -
Piotr Kamoda about 7 years
>>> a = df.replace({np.nan:None}).ix[0,60] >>> a nan >>> type(a) <class 'numpy.float64'>
-
jezrael about 7 yearsHmmm, and what about
print (df.to_json(orient='records'))
? -to_json
-
Piotr Kamoda about 7 years
>>> print (df.head().ix[:,60].to_json(orient='records')) [null,null,null,null,null]
¯\(°_o)/¯ -
jezrael about 7 yearsYes, you are right. I find this -
Note NaN‘s, NaT‘s and None will be converted to null and datetime objects will be converted based on the date_format and date_unit parameters.
. So if replace dont work, use only your solution :(