Can't drop NAN with dropna in pandas
Solution 1
You need to read the documentation (emphasis added):
Return object with labels on given axis omitted
dropna
returns a new DataFrame. If you want it to modify the existing DataFrame, all you have to do is read further in the documentation:
inplace : boolean, default False
If True, do operation inplace and return None.
So to modify it in place, do traindataset.dropna(how='any', inplace=True)
.
Solution 2
pd.DataFrame.dropna
uses inplace=False
by default. This is the norm with most Pandas operations; exceptions do exist, e.g. update
.
Therefore, you must either assign back to your variable, or state explicitly inplace=True
:
df = df.dropna(how='any') # assign back
df.dropna(how='any', inplace=True) # set inplace parameter
Stylistically, the former is often preferred as it supports operator chaining, and the latter often does not yield any or significant performance benefits.
Solution 3
Alternatively, you can also use notnull()
method to select the rows which are not null
.
For example if you want to select Non null
values from columns country
and variety
of the dataframe reviews:
answer=reviews.loc[(reviews.country.notnull()) & (reviews.variety.notnull())]
But here we are just selecting relevant data;to remove null
values you should use dropna()
method.
Solution 4
This is my first post. I just spent a few hours debugging this exact issue and I would like to share how I fixed this issue.
I was converting my entire dataframe to a string and then placing that value back into the dataframe using similar code to what is displayed below: (please note, the code below will only convert the value to a string)
row_counter = 0
for ind, row in dataf.iterrows():
cell_value = str(row['column_header'])
dataf.loc[row_counter, 'column_header'] = cell_value
row_counter += 1
After converting the entire dataframe to a string, I then used the dropna()
function. The values that were previously NaN
(considered a null value by pandas) were converted to the string 'nan'
.
In conclusion, drop blank values FIRST, before you start manipulating data in the CSV and converting its data type.
fangh
Updated on July 13, 2022Comments
-
fangh almost 2 years
I import pandas as pd and run the code below and get the following result
Code:
traindataset = pd.read_csv('/Users/train.csv') print traindataset.dtypes print traindataset.shape print traindataset.iloc[25,3] traindataset.dropna(how='any') print traindataset.iloc[25,3] print traindataset.shape
Output
TripType int64 VisitNumber int64 Weekday object Upc float64 ScanCount int64 DepartmentDescription object FinelineNumber float64 dtype: object (647054, 7) nan nan (647054, 7) [Finished in 2.2s]
From the result, the dropna line doesn't work because the row number doesn't change and there is still NAN in the dataframe. How that comes? I am craaaazy right now.
-
fangh over 8 yearsThanks. I got confused by the example of "dropna" from the tutorial. pandas.pydata.org/pandas-docs/stable/10min.html
-
Valentin H about 2 yearsAll the tutorials I found shot a df before and after dropna on the same object. Anyway +1, thank you!