NaN is not recognized in pandas after np.where clause. Why? Or is it a bug?
You can see why if you look at the result of the where
:
>>> np.where(a.isnull(), np.nan, "Hello")
array([u'Hello', u'nan'],
dtype='<U32')
Because your other value is a string, where
converts your NaN
to a string as well and gives you a string-dtyped result. (The exact dtype you get may different depending on your platform and/or Python version.) So you don't actually have a NaN in your result at all, you just have the string "nan"
.
If you want to do this type of mapping (in particular, mapping that changes dtypes) in pandas, it's usually better to use pandas constructs like .map
and avoid dropping into numpy, because as you saw, numpy tends to do unhelpful things when it has to resolve conflicting types. Here's an example of how to do it all in pandas:
>>> b["X"] = a.isnull().map({True: np.nan, False: "Hello"})
>>> b
0 X
0 a Hello
1 b NaN
>>> b.X.isnull()
0 False
1 True
Name: X, dtype: bool
keiv.fly
Analytics in Python and R. Programmed C#, PHP, VBA. Studied Economics. Obsessed with performance optimization.
Updated on June 17, 2022Comments
-
keiv.fly almost 2 years
NaN is not recognized in pandas after np.where clause. Why? Or is it a bug?
The last line of this code should be "True"
In [1]: import pandas as pd In [2]: import numpy as np In [3]: a=pd.Series([1,np.nan]) In [4]: b=pd.DataFrame(["a","b"]) In [5]: b["1"]=np.where( a.isnull(), np.nan, "Hello" ) In [6]: b Out[6]: 0 1 0 a Hello 1 b nan In [7]: b[1].isnull() Out[7]: 0 False 1 False Name: 1, dtype: bool