NaN is not recognized in pandas after np.where clause. Why? Or is it a bug?

15,728

You can see why if you look at the result of the where:

>>> np.where(a.isnull(), np.nan, "Hello")
array([u'Hello', u'nan'], 
      dtype='<U32')

Because your other value is a string, where converts your NaN to a string as well and gives you a string-dtyped result. (The exact dtype you get may different depending on your platform and/or Python version.) So you don't actually have a NaN in your result at all, you just have the string "nan".

If you want to do this type of mapping (in particular, mapping that changes dtypes) in pandas, it's usually better to use pandas constructs like .map and avoid dropping into numpy, because as you saw, numpy tends to do unhelpful things when it has to resolve conflicting types. Here's an example of how to do it all in pandas:

>>> b["X"] = a.isnull().map({True: np.nan, False: "Hello"})
>>> b
   0      X
0  a  Hello
1  b    NaN
>>> b.X.isnull()
0    False
1     True
Name: X, dtype: bool
Share:
15,728
keiv.fly
Author by

keiv.fly

Analytics in Python and R. Programmed C#, PHP, VBA. Studied Economics. Obsessed with performance optimization.

Updated on June 17, 2022

Comments

  • keiv.fly
    keiv.fly almost 2 years

    NaN is not recognized in pandas after np.where clause. Why? Or is it a bug?

    The last line of this code should be "True"

    In [1]: import pandas as pd
    In [2]: import numpy as np   
    In [3]: a=pd.Series([1,np.nan])    
    In [4]: b=pd.DataFrame(["a","b"])  
    In [5]: b["1"]=np.where(
                a.isnull(),
                np.nan,
                "Hello"
            )   
    In [6]: b
    Out[6]:
       0      1
    0  a  Hello
    1  b    nan    
    In [7]: b[1].isnull()
    Out[7]:
    0    False
    1    False
    Name: 1, dtype: bool