How to iterate through rows of a dataframe and check whether value in a column row is NaN

35,171

Solution 1

As you already understand , frame in

for item, frame in df['Column2'].iteritems():

is every row in the Column, its type would be the type of elements in the column (which most probably would not be Series or DataFrame). Hence, frame.notnull() on that would not work.

You should instead try -

for item, frame in df['Column2'].iteritems():
    if pd.notnull(frame):
        print frame

Solution 2

try this:

df[df['Column2'].notnull()]

The above code will give you the data for which Column2 has not null value

Solution 3

Using iteritems on a Series (which is what you get when you take a column from a DataFrame) iterates over pairs (index, value). So your item will take the values 0, 1, and 2 in the three iterations of the loop, and your frame will take the values 'hey', NaN, and 'up' (so "frame" is probably a bad name for it). The error comes from trying to use the method notnull on NaN (which is represented as a floating-point number).

You can use the function pd.notnull instead:

In [3]: pd.notnull(np.nan)
Out[3]: False

In [4]: pd.notnull('hey')
Out[4]: True

Another way would be to use notnull on the whole Series, and then iterate over those values (which are now boolean):

for _, value in df['Column2'].notnull().iteritems():
    if value:
        print 'frame'
Share:
35,171
sequence_hard
Author by

sequence_hard

Recently started working in Bioinformatics, therefore learning python2 :)

Updated on July 25, 2022

Comments

  • sequence_hard
    sequence_hard almost 2 years

    I have a beginner question. I have a dataframe I am iterating over and I want to check if a value in a column2 row is NaN or not, to perform an action on this value if it is not NaN. My DataFrame looks like this:

    df:
    
      Column1  Column2
    0    a        hey
    1    b        NaN
    2    c        up
    

    What I am trying right now is:

    for item, frame in df['Column2'].iteritems():
        if frame.notnull() == True:
            print 'frame'
    

    The thought behind that is that I iterate over the rows in column 2 and print frame for every row that has a value (which is a string). What I get however is this:

    AttributeError                            Traceback (most recent call last)
    <ipython-input-80-8b871a452417> in <module>()
          1 for item, frame in df['Column2'].iteritems():
    ----> 2     if frame.notnull() == True:
          3         print 'frame'
    
    AttributeError: 'float' object has no attribute 'notnull'
    

    When I only run the first line of my code, I get

    0
    hey
    1
    nan
    2
    up
    

    which suggests that the floats in the output of the first line are the cause of the error. Can anybody tell me how I can accomplish what I want?

  • sequence_hard
    sequence_hard over 8 years
    It works in terms that only the frames (rows) are printed, but the nan values are still present.. But why are the frame values floats when they should be strings?
  • Evan Wright
    Evan Wright over 8 years
    Pandas represents all missing values as the floating-point number nan
  • Anand S Kumar
    Anand S Kumar over 8 years
    You can use pd.notnull() to check if the value is not NaN . If you want to also filter out empty strings/ None values, you can also do - if frame and pd.notnull(frame): .
  • sequence_hard
    sequence_hard over 8 years
    @AnandSKumar Okay, this works. I think I called .notnull() wrong when I tried that before. Thank you very much!
  • sequence_hard
    sequence_hard over 8 years
    Thanks for the explanation of the error, I didn't know that nan was presented as floating point!