Filtering Pandas DataFrames on dates
Solution 1
If date column is the index, then use .loc for label based indexing or .iloc for positional indexing.
For example:
df.loc['2014-01-01':'2014-02-01']
See details here http://pandas.pydata.org/pandas-docs/stable/dsintro.html#indexing-selection
If the column is not the index you have two choices:
- Make it the index (either temporarily or permanently if it's time-series data)
df[(df['date'] > '2013-01-01') & (df['date'] < '2013-02-01')]
See here for the general explanation
Note: .ix is deprecated.
Solution 2
Previous answer is not correct in my experience, you can't pass it a simple string, needs to be a datetime object. So:
import datetime
df.loc[datetime.date(year=2014,month=1,day=1):datetime.date(year=2014,month=2,day=1)]
Solution 3
And if your dates are standardized by importing datetime package, you can simply use:
df[(df['date']>datetime.date(2016,1,1)) & (df['date']<datetime.date(2016,3,1))]
For standarding your date string using datetime package, you can use this function:
import datetime
datetime.datetime.strptime
Solution 4
If you have already converted the string to a date format using pd.to_datetime you can just use:
df = df[(df['Date'] > "2018-01-01") & (df['Date'] < "2019-07-01")]
Solution 5
If your datetime column have the Pandas datetime type (e.g. datetime64[ns]
), for proper filtering you need the pd.Timestamp object, for example:
from datetime import date
import pandas as pd
value_to_check = pd.Timestamp(date.today().year, 1, 1)
filter_mask = df['date_column'] < value_to_check
filtered_df = df[filter_mask]
Related videos on Youtube
Comments
-
user1121201 almost 2 years
I have a Pandas DataFrame with a 'date' column. Now I need to filter out all rows in the DataFrame that have dates outside of the next two months. Essentially, I only need to retain the rows that are within the next two months.
What is the best way to achieve this?
-
user1121201 about 10 yearsThank you, will read. The date is a seperate column and not the index in my case. I should have probably given that information in the first place. MY question was not very informative.
-
Retozi about 10 yearsupdated my answer to account for date column filtering
-
Phillip Cloud about 10 yearsYou can use
query
here as well.df.query('20130101 < date < 20130201')
. -
Union find over 9 yearsSame as: stackoverflow.com/questions/16341367/… Which is also useful.
-
Rafael Barbosa almost 8 yearsYou should mention that the filters for index (via
.loc
and.ix
) and columns in your examples are not equivalent.df.ix['2014-01-01':'2014-02-01']
includes2014-02-01
whiledf[(df['date'] > '2013-01-01') & (df['date'] < '2013-02-01')]
does not include2013-02-01
, it will only match rows up to2013-01-31
. -
Ninjakannon over 7 yearsI can absolutely pass a string with no issues.
-
Nick almost 7 yearsix indexer is deprecated, use loc - pandas.pydata.org/pandas-docs/stable/…
-
Mohamed Taher Alrefaie over 6 yearsThis call is deprecated now!
-
janscas about 6 yearspandas will convert any "datetime" string into a datetime object.. so it's correct
-
Salem almost 6 yearsWhat if one doesn't want to filter on a date range, but on multiple datetimes ?
-
Addem over 5 yearsWill this correctly compare a datetime object against a string?
-
Michael Norman over 5 yearsI recieve the following error using this: TypeError: '<' not supported between instances of 'int' and 'datetime.date'
-
user305883 about 5 years@Phillip Cloud I would like use your solution, but:
df.query('20130101 < time < 20130201')
anddf.query('2013-01-01 < time < 2013-02-01')
throwTypeError: Cannot compare type 'Period' with type 'int'
(or string). Trieddf.query(pd.to_datetime('2013-01-01') < 'time' < pd.to_datetime('2013-02-01'))
butTypeError: Cannot compare type 'Timestamp' with type 'str'
. My df['time'] is ato_datetime
object, formatted as'%Y%m%d'
. Any suggestion ? -
user305883 about 5 yearsFigure it out: my error was because I stripped date object to only keep date formatting, without time. So it was no more a to_datetime object. That is, my df[time] was constructed as
pd.to_datetime(..).dt.date
instead of simplepd.to_datetime(..)
-
So S over 4 yearsIt is recommended to use
df[(df['date']>pd.Timestamp(2016,1,1)) & (df['date']<pd.Timestamp(2016,3,1))]
. -
ANUBIS about 4 years
index_col
has to be astring
not a list.mydata = pd.read_csv('mydata.csv',index_col='date')
-
Glen Moutrie almost 4 yearsCould you pass a link for documentation for @ts functions?
-
MarMat almost 4 yearsI get this: TypeError: '<' not supported between instances of 'datetime.date' and 'str'
-
Alberto over 3 yearsI think there is a small typo, it should be
df.loc[df.index.month == 3]
-
mah65 about 3 yearsValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
-
ChaimG over 2 yearsYou may not need
pd.TimeStamp
here.df.query('date > 20190515071320')
seems to work fine. -
Baobab about 2 yearsHow can I set the filter more precise; such as from '2013-01-01 16:53:22:' onwards?
-
Crispy Holiday almost 2 yearsPlease be aware that by using loc it means you are expecting the exact min and max dates to exists as values. This solution won't work if you're filtering by dates that might not exist exactly in the df