python pandas extract year from datetime: df['year'] = df['date'].year is not working

python datetime pandas extract dataframe

176,628

Solution 1

If you're running a recent-ish version of pandas then you can use the datetime attribute dt to access the datetime components:

In [6]:

df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].dt.year, df['date'].dt.month
df
Out[6]:
        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

EDIT

It looks like you're running an older version of pandas in which case the following would work:

In [18]:

df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].apply(lambda x: x.year), df['date'].apply(lambda x: x.month)
df
Out[18]:
        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

Regarding why it didn't parse this into a datetime in read_csv you need to pass the ordinal position of your column ([0]) because when True it tries to parse columns [1,2,3] see the docs

In [20]:

t="""date   Count
6/30/2010   525
7/30/2010   136
8/31/2010   125
9/30/2010   84
10/29/2010  4469"""
df = pd.read_csv(io.StringIO(t), sep='\s+', parse_dates=[0])
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 2 columns):
date     5 non-null datetime64[ns]
Count    5 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 120.0 bytes

So if you pass param parse_dates=[0] to read_csv there shouldn't be any need to call to_datetime on the 'date' column after loading.

Solution 2

This works:

df['date'].dt.year

Now:

df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month

gives this data frame:

        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

Solution 3

When to use `dt` accessor

A common source of confusion revolves around when to use .year and when to use .dt.year.

The former is an attribute for pd.DatetimeIndex objects; the latter for pd.Series objects. Consider this dataframe:

df = pd.DataFrame({'Dates': pd.to_datetime(['2018-01-01', '2018-10-20', '2018-12-25'])},
                  index=pd.to_datetime(['2000-01-01', '2000-01-02', '2000-01-03']))

The definition of the series and index look similar, but the pd.DataFrame constructor converts them to different types:

type(df.index)     # pandas.tseries.index.DatetimeIndex
type(df['Dates'])  # pandas.core.series.Series

The DatetimeIndex object has a direct year attribute, while the Series object must use the dt accessor. Similarly for month:

df.index.month               # array([1, 1, 1])
df['Dates'].dt.month.values  # array([ 1, 10, 12], dtype=int64)

A subtle but important difference worth noting is that df.index.month gives a NumPy array, while df['Dates'].dt.month gives a Pandas series. Above, we use pd.Series.values to extract the NumPy array representation.

Solution 4

Probably already too late to answer but since you have already parse the dates while loading the data, you can just do this to get the day

df['date'] = pd.DatetimeIndex(df['date']).year

Solution 5

What worked for me was upgrading pandas to latest version:

From Command Line do:

conda update pandas

View more solutions

176,628

MJS

Updated on January 20, 2021

Comments

MJS over 3 years
I import a dataframe via read_csv, but for some reason can't extract the year or month from the series df['date'], trying that gives AttributeError: 'Series' object has no attribute 'year':
```
date    Count
6/30/2010   525
7/30/2010   136
8/31/2010   125
9/30/2010   84
10/29/2010  4469

df = pd.read_csv('sample_data.csv', parse_dates=True)

df['date'] = pd.to_datetime(df['date'])

df['year'] = df['date'].year
df['month'] = df['date'].month
```
UPDATE: and when I try solutions with df['date'].dt on my pandas version 0.14.1, I get "AttributeError: 'Series' object has no attribute 'dt' ":
```
df = pd.read_csv('sample_data.csv',parse_dates=True)

df['date'] = pd.to_datetime(df['date'])

df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
```
Sorry for this question that seems repetitive - I expect the answer will make me feel like a bonehead... but I have not had any luck using answers to the similar questions on SO.

FOLLOWUP: I can't seem to update my pandas 0.14.1 to a newer release in my Anaconda environment, each of the attempts below generates an invalid syntax error. I'm using Python 3.4.1 64bit.
```
conda update pandas

conda install pandas==0.15.2

conda install -f pandas
```
Any ideas?
- Padraic Cunningham almost 9 years
  
  where is year coming from? Are you trying to access a row (df["date"][0].year?
- MJS almost 9 years
  
  I have a csv file with dates and other columns which all came out of SQL as string data. Trying to create new year and month columns to use for grouping. the csv has ~5000 rows.
- Padraic Cunningham almost 9 years
  
  Yes but the df["date"] is a 'pandas.core.series.Series' object. what should df['date'].year be?
- MJS almost 9 years
  
  I just want to create 2 more columns... one for year and one for month as integers.
- MJS almost 9 years
  
  tried df['date'].dt.year as well, EdChum.
- EdChum almost 9 years
  
  see my update, it should work in your version of pandas, can you post which version you're using
- smci over 3 years
  
  The .dt accessor for datetime functionality was new in pandas 0.15.0 (Oct 2014). So it ain't gonna work in 0.14.1, you must update. The footnote about Anaconda update is a separate question, recommend deleting it, conda is a different beast and well covered by other Q&A.