How to iterate through rows of a dataframe and check whether value in a column row is NaN
Solution 1
As you already understand , frame
in
for item, frame in df['Column2'].iteritems():
is every row
in the Column, its type would be the type of elements in the column (which most probably would not be Series
or DataFrame
). Hence, frame.notnull()
on that would not work.
You should instead try -
for item, frame in df['Column2'].iteritems():
if pd.notnull(frame):
print frame
Solution 2
try this:
df[df['Column2'].notnull()]
The above code will give you the data for which Column2
has not null value
Solution 3
Using iteritems
on a Series (which is what you get when you take a column from a DataFrame) iterates over pairs (index, value). So your item
will take the values 0, 1, and 2 in the three iterations of the loop, and your frame
will take the values 'hey'
, NaN
, and 'up'
(so "frame" is probably a bad name for it). The error comes from trying to use the method notnull
on NaN
(which is represented as a floating-point number).
You can use the function pd.notnull
instead:
In [3]: pd.notnull(np.nan)
Out[3]: False
In [4]: pd.notnull('hey')
Out[4]: True
Another way would be to use notnull
on the whole Series, and then iterate over those values (which are now boolean):
for _, value in df['Column2'].notnull().iteritems():
if value:
print 'frame'
sequence_hard
Recently started working in Bioinformatics, therefore learning python2 :)
Updated on July 25, 2022Comments
-
sequence_hard almost 2 years
I have a beginner question. I have a dataframe I am iterating over and I want to check if a value in a column2 row is
NaN
or not, to perform an action on this value if it is notNaN
. My DataFrame looks like this:df: Column1 Column2 0 a hey 1 b NaN 2 c up
What I am trying right now is:
for item, frame in df['Column2'].iteritems(): if frame.notnull() == True: print 'frame'
The thought behind that is that I iterate over the rows in column 2 and
print
frame for every row that has a value (which is a string). What I get however is this:AttributeError Traceback (most recent call last) <ipython-input-80-8b871a452417> in <module>() 1 for item, frame in df['Column2'].iteritems(): ----> 2 if frame.notnull() == True: 3 print 'frame' AttributeError: 'float' object has no attribute 'notnull'
When I only run the first line of my code, I get
0 hey 1 nan 2 up
which suggests that the floats in the output of the first line are the cause of the error. Can anybody tell me how I can accomplish what I want?
-
sequence_hard over 8 yearsIt works in terms that only the frames (rows) are printed, but the
nan
values are still present.. But why are the frame values floats when they should be strings? -
Evan Wright over 8 yearsPandas represents all missing values as the floating-point number
nan
-
Anand S Kumar over 8 yearsYou can use
pd.notnull()
to check if the value is notNaN
. If you want to also filter out empty strings/ None values, you can also do -if frame and pd.notnull(frame):
. -
sequence_hard over 8 years@AnandSKumar Okay, this works. I think I called .notnull() wrong when I tried that before. Thank you very much!
-
sequence_hard over 8 yearsThanks for the explanation of the error, I didn't know that nan was presented as floating point!