python pandas extract year from datetime: df['year'] = df['date'].year is not working
Solution 1
If you're running a recent-ish version of pandas then you can use the datetime attribute dt
to access the datetime components:
In [6]:
df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].dt.year, df['date'].dt.month
df
Out[6]:
date Count year month
0 2010-06-30 525 2010 6
1 2010-07-30 136 2010 7
2 2010-08-31 125 2010 8
3 2010-09-30 84 2010 9
4 2010-10-29 4469 2010 10
EDIT
It looks like you're running an older version of pandas in which case the following would work:
In [18]:
df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'] = df['date'].apply(lambda x: x.year), df['date'].apply(lambda x: x.month)
df
Out[18]:
date Count year month
0 2010-06-30 525 2010 6
1 2010-07-30 136 2010 7
2 2010-08-31 125 2010 8
3 2010-09-30 84 2010 9
4 2010-10-29 4469 2010 10
Regarding why it didn't parse this into a datetime in read_csv
you need to pass the ordinal position of your column ([0]
) because when True
it tries to parse columns [1,2,3]
see the docs
In [20]:
t="""date Count
6/30/2010 525
7/30/2010 136
8/31/2010 125
9/30/2010 84
10/29/2010 4469"""
df = pd.read_csv(io.StringIO(t), sep='\s+', parse_dates=[0])
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 2 columns):
date 5 non-null datetime64[ns]
Count 5 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 120.0 bytes
So if you pass param parse_dates=[0]
to read_csv
there shouldn't be any need to call to_datetime
on the 'date' column after loading.
Solution 2
This works:
df['date'].dt.year
Now:
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
gives this data frame:
date Count year month
0 2010-06-30 525 2010 6
1 2010-07-30 136 2010 7
2 2010-08-31 125 2010 8
3 2010-09-30 84 2010 9
4 2010-10-29 4469 2010 10
Solution 3
When to use dt
accessor
A common source of confusion revolves around when to use .year
and when to use .dt.year
.
The former is an attribute for pd.DatetimeIndex
objects; the latter for pd.Series
objects. Consider this dataframe:
df = pd.DataFrame({'Dates': pd.to_datetime(['2018-01-01', '2018-10-20', '2018-12-25'])},
index=pd.to_datetime(['2000-01-01', '2000-01-02', '2000-01-03']))
The definition of the series and index look similar, but the pd.DataFrame
constructor converts them to different types:
type(df.index) # pandas.tseries.index.DatetimeIndex
type(df['Dates']) # pandas.core.series.Series
The DatetimeIndex
object has a direct year
attribute, while the Series
object must use the dt
accessor. Similarly for month
:
df.index.month # array([1, 1, 1])
df['Dates'].dt.month.values # array([ 1, 10, 12], dtype=int64)
A subtle but important difference worth noting is that df.index.month
gives a NumPy array, while df['Dates'].dt.month
gives a Pandas series. Above, we use pd.Series.values
to extract the NumPy array representation.
Solution 4
Probably already too late to answer but since you have already parse the dates while loading the data, you can just do this to get the day
df['date'] = pd.DatetimeIndex(df['date']).year
Solution 5
What worked for me was upgrading pandas to latest version:
From Command Line do:
conda update pandas
Related videos on Youtube
MJS
Updated on January 20, 2021Comments
-
MJS over 3 years
I import a dataframe via
read_csv
, but for some reason can't extract the year or month from the seriesdf['date']
, trying that givesAttributeError: 'Series' object has no attribute 'year'
:date Count 6/30/2010 525 7/30/2010 136 8/31/2010 125 9/30/2010 84 10/29/2010 4469 df = pd.read_csv('sample_data.csv', parse_dates=True) df['date'] = pd.to_datetime(df['date']) df['year'] = df['date'].year df['month'] = df['date'].month
UPDATE: and when I try solutions with
df['date'].dt
on my pandas version 0.14.1, I get "AttributeError: 'Series' object has no attribute 'dt' ":df = pd.read_csv('sample_data.csv',parse_dates=True) df['date'] = pd.to_datetime(df['date']) df['year'] = df['date'].dt.year df['month'] = df['date'].dt.month
Sorry for this question that seems repetitive - I expect the answer will make me feel like a bonehead... but I have not had any luck using answers to the similar questions on SO.
FOLLOWUP: I can't seem to update my pandas 0.14.1 to a newer release in my Anaconda environment, each of the attempts below generates an invalid syntax error. I'm using Python 3.4.1 64bit.
conda update pandas conda install pandas==0.15.2 conda install -f pandas
Any ideas?
-
Padraic Cunningham almost 9 yearswhere is year coming from? Are you trying to access a row
(df["date"][0].year
? -
MJS almost 9 yearsI have a csv file with dates and other columns which all came out of SQL as string data. Trying to create new year and month columns to use for grouping. the csv has ~5000 rows.
-
Padraic Cunningham almost 9 yearsYes but the
df["date"]
is a'pandas.core.series.Series'
object. what shoulddf['date'].year
be? -
MJS almost 9 yearsI just want to create 2 more columns... one for year and one for month as integers.
-
MJS almost 9 yearstried df['date'].dt.year as well, EdChum.
-
EdChum almost 9 yearssee my update, it should work in your version of pandas, can you post which version you're using
-
smci over 3 yearsThe
.dt
accessor for datetime functionality was new in pandas 0.15.0 (Oct 2014). So it ain't gonna work in 0.14.1, you must update. The footnote about Anaconda update is a separate question, recommend deleting it, conda is a different beast and well covered by other Q&A.
-