Filtering Pandas DataFrames on dates

613,385

Solution 1

If date column is the index, then use .loc for label based indexing or .iloc for positional indexing.

For example:

df.loc['2014-01-01':'2014-02-01']

See details here http://pandas.pydata.org/pandas-docs/stable/dsintro.html#indexing-selection

If the column is not the index you have two choices:

  1. Make it the index (either temporarily or permanently if it's time-series data)
  2. df[(df['date'] > '2013-01-01') & (df['date'] < '2013-02-01')]

See here for the general explanation

Note: .ix is deprecated.

Solution 2

Previous answer is not correct in my experience, you can't pass it a simple string, needs to be a datetime object. So:

import datetime 
df.loc[datetime.date(year=2014,month=1,day=1):datetime.date(year=2014,month=2,day=1)]

Solution 3

And if your dates are standardized by importing datetime package, you can simply use:

df[(df['date']>datetime.date(2016,1,1)) & (df['date']<datetime.date(2016,3,1))]  

For standarding your date string using datetime package, you can use this function:

import datetime
datetime.datetime.strptime

Solution 4

If you have already converted the string to a date format using pd.to_datetime you can just use:

df = df[(df['Date'] > "2018-01-01") & (df['Date'] < "2019-07-01")]

Solution 5

If your datetime column have the Pandas datetime type (e.g. datetime64[ns]), for proper filtering you need the pd.Timestamp object, for example:

from datetime import date

import pandas as pd

value_to_check = pd.Timestamp(date.today().year, 1, 1)
filter_mask = df['date_column'] < value_to_check
filtered_df = df[filter_mask]
Share:
613,385

Related videos on Youtube

user1121201
Author by

user1121201

Programmer

Updated on May 03, 2022

Comments

  • user1121201
    user1121201 almost 2 years

    I have a Pandas DataFrame with a 'date' column. Now I need to filter out all rows in the DataFrame that have dates outside of the next two months. Essentially, I only need to retain the rows that are within the next two months.

    What is the best way to achieve this?

  • user1121201
    user1121201 about 10 years
    Thank you, will read. The date is a seperate column and not the index in my case. I should have probably given that information in the first place. MY question was not very informative.
  • Retozi
    Retozi about 10 years
    updated my answer to account for date column filtering
  • Phillip Cloud
    Phillip Cloud about 10 years
    You can use query here as well. df.query('20130101 < date < 20130201').
  • Union find
    Union find over 9 years
    Same as: stackoverflow.com/questions/16341367/… Which is also useful.
  • Rafael Barbosa
    Rafael Barbosa almost 8 years
    You should mention that the filters for index (via .loc and .ix) and columns in your examples are not equivalent. df.ix['2014-01-01':'2014-02-01'] includes 2014-02-01 while df[(df['date'] > '2013-01-01') & (df['date'] < '2013-02-01')] does not include 2013-02-01, it will only match rows up to 2013-01-31.
  • Ninjakannon
    Ninjakannon over 7 years
    I can absolutely pass a string with no issues.
  • Nick
    Nick almost 7 years
    ix indexer is deprecated, use loc - pandas.pydata.org/pandas-docs/stable/…
  • Mohamed Taher Alrefaie
    Mohamed Taher Alrefaie over 6 years
    This call is deprecated now!
  • janscas
    janscas about 6 years
    pandas will convert any "datetime" string into a datetime object.. so it's correct
  • Salem
    Salem almost 6 years
    What if one doesn't want to filter on a date range, but on multiple datetimes ?
  • Addem
    Addem over 5 years
    Will this correctly compare a datetime object against a string?
  • Michael Norman
    Michael Norman over 5 years
    I recieve the following error using this: TypeError: '<' not supported between instances of 'int' and 'datetime.date'
  • user305883
    user305883 about 5 years
    @Phillip Cloud I would like use your solution, but: df.query('20130101 < time < 20130201') and df.query('2013-01-01 < time < 2013-02-01') throw TypeError: Cannot compare type 'Period' with type 'int' (or string). Tried df.query(pd.to_datetime('2013-01-01') < 'time' < pd.to_datetime('2013-02-01'))but TypeError: Cannot compare type 'Timestamp' with type 'str'. My df['time'] is a to_datetimeobject, formatted as '%Y%m%d'. Any suggestion ?
  • user305883
    user305883 about 5 years
    Figure it out: my error was because I stripped date object to only keep date formatting, without time. So it was no more a to_datetime object. That is, my df[time] was constructed as pd.to_datetime(..).dt.date instead of simple pd.to_datetime(..)
  • So S
    So S over 4 years
    It is recommended to use df[(df['date']>pd.Timestamp(2016,1,1)) & (df['date']<pd.Timestamp(2016,3,1))].
  • ANUBIS
    ANUBIS about 4 years
    index_col has to be a string not a list. mydata = pd.read_csv('mydata.csv',index_col='date')
  • Glen Moutrie
    Glen Moutrie almost 4 years
    Could you pass a link for documentation for @ts functions?
  • MarMat
    MarMat almost 4 years
    I get this: TypeError: '<' not supported between instances of 'datetime.date' and 'str'
  • Alberto
    Alberto over 3 years
    I think there is a small typo, it should be df.loc[df.index.month == 3]
  • mah65
    mah65 about 3 years
    ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
  • ChaimG
    ChaimG over 2 years
    You may not need pd.TimeStamp here. df.query('date > 20190515071320') seems to work fine.
  • Baobab
    Baobab about 2 years
    How can I set the filter more precise; such as from '2013-01-01 16:53:22:' onwards?
  • Crispy Holiday
    Crispy Holiday almost 2 years
    Please be aware that by using loc it means you are expecting the exact min and max dates to exists as values. This solution won't work if you're filtering by dates that might not exist exactly in the df