python pandas extract year from datetime: df['year'] = df['date'].year is not working

176,628

Solution 1

If you're running a recent-ish version of pandas then you can use the datetime attribute dt to access the datetime components:

In [6]:

df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].dt.year, df['date'].dt.month
df
Out[6]:
        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

EDIT

It looks like you're running an older version of pandas in which case the following would work:

In [18]:

df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].apply(lambda x: x.year), df['date'].apply(lambda x: x.month)
df
Out[18]:
        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

Regarding why it didn't parse this into a datetime in read_csv you need to pass the ordinal position of your column ([0]) because when True it tries to parse columns [1,2,3] see the docs

In [20]:

t="""date   Count
6/30/2010   525
7/30/2010   136
8/31/2010   125
9/30/2010   84
10/29/2010  4469"""
df = pd.read_csv(io.StringIO(t), sep='\s+', parse_dates=[0])
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 2 columns):
date     5 non-null datetime64[ns]
Count    5 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 120.0 bytes

So if you pass param parse_dates=[0] to read_csv there shouldn't be any need to call to_datetime on the 'date' column after loading.

Solution 2

This works:

df['date'].dt.year

Now:

df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month

gives this data frame:

        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

Solution 3

When to use dt accessor

A common source of confusion revolves around when to use .year and when to use .dt.year.

The former is an attribute for pd.DatetimeIndex objects; the latter for pd.Series objects. Consider this dataframe:

df = pd.DataFrame({'Dates': pd.to_datetime(['2018-01-01', '2018-10-20', '2018-12-25'])},
                  index=pd.to_datetime(['2000-01-01', '2000-01-02', '2000-01-03']))

The definition of the series and index look similar, but the pd.DataFrame constructor converts them to different types:

type(df.index)     # pandas.tseries.index.DatetimeIndex
type(df['Dates'])  # pandas.core.series.Series

The DatetimeIndex object has a direct year attribute, while the Series object must use the dt accessor. Similarly for month:

df.index.month               # array([1, 1, 1])
df['Dates'].dt.month.values  # array([ 1, 10, 12], dtype=int64)

A subtle but important difference worth noting is that df.index.month gives a NumPy array, while df['Dates'].dt.month gives a Pandas series. Above, we use pd.Series.values to extract the NumPy array representation.

Solution 4

Probably already too late to answer but since you have already parse the dates while loading the data, you can just do this to get the day

df['date'] = pd.DatetimeIndex(df['date']).year

Solution 5

What worked for me was upgrading pandas to latest version:

From Command Line do:

conda update pandas
Share:
176,628

Related videos on Youtube

MJS
Author by

MJS

Updated on January 20, 2021

Comments

  • MJS
    MJS over 3 years

    I import a dataframe via read_csv, but for some reason can't extract the year or month from the series df['date'], trying that gives AttributeError: 'Series' object has no attribute 'year':

    date    Count
    6/30/2010   525
    7/30/2010   136
    8/31/2010   125
    9/30/2010   84
    10/29/2010  4469
    
    df = pd.read_csv('sample_data.csv', parse_dates=True)
    
    df['date'] = pd.to_datetime(df['date'])
    
    df['year'] = df['date'].year
    df['month'] = df['date'].month
    

    UPDATE: and when I try solutions with df['date'].dt on my pandas version 0.14.1, I get "AttributeError: 'Series' object has no attribute 'dt' ":

    df = pd.read_csv('sample_data.csv',parse_dates=True)
    
    df['date'] = pd.to_datetime(df['date'])
    
    df['year'] = df['date'].dt.year
    df['month'] = df['date'].dt.month
    

    Sorry for this question that seems repetitive - I expect the answer will make me feel like a bonehead... but I have not had any luck using answers to the similar questions on SO.


    FOLLOWUP: I can't seem to update my pandas 0.14.1 to a newer release in my Anaconda environment, each of the attempts below generates an invalid syntax error. I'm using Python 3.4.1 64bit.

    conda update pandas
    
    conda install pandas==0.15.2
    
    conda install -f pandas
    

    Any ideas?

    • Padraic Cunningham
      Padraic Cunningham almost 9 years
      where is year coming from? Are you trying to access a row (df["date"][0].year?
    • MJS
      MJS almost 9 years
      I have a csv file with dates and other columns which all came out of SQL as string data. Trying to create new year and month columns to use for grouping. the csv has ~5000 rows.
    • Padraic Cunningham
      Padraic Cunningham almost 9 years
      Yes but the df["date"] is a 'pandas.core.series.Series' object. what should df['date'].year be?
    • MJS
      MJS almost 9 years
      I just want to create 2 more columns... one for year and one for month as integers.
    • MJS
      MJS almost 9 years
      tried df['date'].dt.year as well, EdChum.
    • EdChum
      EdChum almost 9 years
      see my update, it should work in your version of pandas, can you post which version you're using
    • smci
      smci over 3 years
      The .dt accessor for datetime functionality was new in pandas 0.15.0 (Oct 2014). So it ain't gonna work in 0.14.1, you must update. The footnote about Anaconda update is a separate question, recommend deleting it, conda is a different beast and well covered by other Q&A.