How to replace NaNs by preceding or next values in pandas DataFrame?

178,880

Solution 1

You could use the fillna method on the DataFrame and specify the method as ffill (forward fill):

>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df.fillna(method='ffill')
   0  1  2
0  1  2  3
1  4  2  3
2  4  2  9

This method...

propagate[s] last valid observation forward to next valid

To go the opposite way, there's also a bfill method.

This method doesn't modify the DataFrame inplace - you'll need to rebind the returned DataFrame to a variable or else specify inplace=True:

df.fillna(method='ffill', inplace=True)

Solution 2

The accepted answer is perfect. I had a related but slightly different situation where I had to fill in forward but only within groups. In case someone has the same need, know that fillna works on a DataFrameGroupBy object.

>>> example = pd.DataFrame({'number':[0,1,2,nan,4,nan,6,7,8,9],'name':list('aaabbbcccc')})
>>> example
  name  number
0    a     0.0
1    a     1.0
2    a     2.0
3    b     NaN
4    b     4.0
5    b     NaN
6    c     6.0
7    c     7.0
8    c     8.0
9    c     9.0
>>> example.groupby('name')['number'].fillna(method='ffill') # fill in row 5 but not row 3
0    0.0
1    1.0
2    2.0
3    NaN
4    4.0
5    4.0
6    6.0
7    7.0
8    8.0
9    9.0
Name: number, dtype: float64

Solution 3

You can use pandas.DataFrame.fillna with the method='ffill' option. 'ffill' stands for 'forward fill' and will propagate last valid observation forward. The alternative is 'bfill' which works the same way, but backwards.

import pandas as pd

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df = df.fillna(method='ffill')

print(df)
#   0  1  2
#0  1  2  3
#1  4  2  3
#2  4  2  9

There is also a direct synonym function for this, pandas.DataFrame.ffill, to make things simpler.

Solution 4

One thing that I noticed when trying this solution is that if you have N/A at the start or the end of the array, ffill and bfill don't quite work. You need both.

In [224]: df = pd.DataFrame([None, 1, 2, 3, None, 4, 5, 6, None])

In [225]: df.ffill()
Out[225]:
     0
0  NaN
1  1.0
...
7  6.0
8  6.0

In [226]: df.bfill()
Out[226]:
     0
0  1.0
1  1.0
...
7  6.0
8  NaN

In [227]: df.bfill().ffill()
Out[227]:
     0
0  1.0
1  1.0
...
7  6.0
8  6.0

Solution 5

Only one column version

  • Fill NAN with last valid value
df[column_name].fillna(method='ffill', inplace=True)
  • Fill NAN with next valid value
df[column_name].fillna(method='backfill', inplace=True)
Share:
178,880

Related videos on Youtube

zegkljan
Author by

zegkljan

I'm a PhD student at the Czech Technical University, Faculty of Electrical Engineering. I'm interested in artificial intelligence. I'm a technology enthusiast and science lover.

Updated on April 22, 2021

Comments

  • zegkljan
    zegkljan about 3 years

    Suppose I have a DataFrame with some NaNs:

    >>> import pandas as pd
    >>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
    >>> df
        0   1   2
    0   1   2   3
    1   4 NaN NaN
    2 NaN NaN   9
    

    What I need to do is replace every NaN with the first non-NaN value in the same column above it. It is assumed that the first row will never contain a NaN. So for the previous example the result would be

       0  1  2
    0  1  2  3
    1  4  2  3
    2  4  2  9
    

    I can just loop through the whole DataFrame column-by-column, element-by-element and set the values directly, but is there an easy (optimally a loop-free) way of achieving this?

  • Tony
    Tony over 6 years
    exactly what I was looking for, ty
  • Prometheus
    Prometheus about 6 years
    Brilliant. I needed exactly this for my problem. Filling both before and after. Thanks a lot.
  • some_programmer
    some_programmer about 5 years
    Great. I need this solution. Thanks
  • BGG16
    BGG16 almost 4 years
    What if the blank cell was in the column names index (i.e., a couple of the columns didn't have names but did have data. Is there a way to use bfill or ffill to fill the blank column index cell with the cell in the row immediately below it? For instance: df = pd.DataFrame({'col1': [2, 4, 8], 'col2': [2, 0, 0], '': [10, 2, 1]}, index=['falcon', 'dog', 'spider'']) How could I use bfill or ffill to change the name of the third column to 10 (which is the value of the row immediately below the blank third column name? Thanks!
  • Alex
    Alex about 2 years
    How do fill values of multiple columns but not all?