Cleanly combine year and month columns to single date column with pandas

14,961

Option 1
Pass a dataframe slice with 3 columns - YEAR, MONTH, and DAY, to pd.to_datetime.

df['DATE'] = pd.to_datetime(df[['YEAR', 'MONTH']].assign(DAY=1))
df

  ID  MONTH  YEAR       DATE
0  A      1  2017 2017-01-01
1  B      2  2017 2017-02-01
2  C      3  2017 2017-03-01
3  D      4  2017 2017-04-01
4  E      5  2017 2017-05-01
5  F      6  2017 2017-06-01

Option 2
String concatenation, with pd.to_datetime.

pd.to_datetime(df.YEAR.astype(str) + '/' + df.MONTH.astype(str) + '/01')

0   2017-01-01
1   2017-02-01
2   2017-03-01
3   2017-04-01
4   2017-05-01
5   2017-06-01
dtype: datetime64[ns]
Share:
14,961
Sam
Author by

Sam

Updated on July 22, 2022

Comments

  • Sam
    Sam almost 2 years

    I have data that looks like this:

    +----+------+-------+
    | ID | YEAR | MONTH |
    +----+------+-------+
    | A  | 2017 |     1 |
    | B  | 2017 |     2 |
    | C  | 2017 |     3 |
    | D  | 2017 |     4 |
    | E  | 2017 |     5 |
    | F  | 2017 |     6 |
    +----+------+-------+
    

    I want to add a new column called DATE which store the a new column made up of a date object of the YEAR and MONTH columns. Something like this:

    +----+------+-------+------------+
    | ID | YEAR | MONTH |    DATE    |
    +----+------+-------+------------+
    | A  | 2017 |     1 | 2017-01-01 |
    | B  | 2017 |     2 | 2017-02-01 |
    | C  | 2017 |     3 | 2017-03-01 |
    | D  | 2017 |     4 | 2017-04-01 |
    | E  | 2017 |     5 | 2017-05-01 |
    | F  | 2017 |     6 | 2017-06-01 |
    +----+------+-------+------------+
    

    I used the following code to create the column, but was wondering if there's a cleaner 'Pythonic' one-liner. Something along the lines of df['DATE']=date(df.year, df.month, 1).

    import pandas as pd
    from datetime import date
    
    
    ID  = ['A', 'B', 'C', 'D', 'E', 'F']
    YEAR = [2017, 2017, 2017, 2017, 2017, 2017]
    MONTH = [1, 2, 3, 4, 5, 6]
    
    
    df = pd.DataFrame({'ID': ID, 'YEAR': YEAR, 'MONTH': MONTH})
    
    
    DATE = []
    for y, m in zip(df.YEAR, df.MONTH):
        DATE.append(date(y, m, 1))
    
    
    df['DATE'] = DATE