pandas merge on date column issue

27,740

I think you need first convert both columns to_datetime because need same dtypes:

df1.month = pd.to_datetime(df1.month)
df2.month = pd.to_datetime(df2.month)

print (pd.merge(df2, df1, how='outer', on='month'))
     factor      month           amt
0  1.000000 2015-02-01  1.549368e+06
1  1.000000 2015-03-01  2.175802e+06
2  1.000000 2015-04-01  1.915613e+06
3  1.000000 2015-05-01  1.703064e+06
4  0.998897 2015-06-01  1.770041e+06

#convert to str date column
df2.month = df2.month.astype(str)

print (pd.merge(df2, df1, how='outer', on='month'))
     factor       month           amt
0  1.000000  2015-02-01  1.549368e+06
1  1.000000  2015-03-01  2.175802e+06
2  1.000000  2015-04-01  1.915613e+06
3  1.000000  2015-05-01  1.703064e+06
4  0.998897  2015-06-01  1.770041e+06
Share:
27,740
muon
Author by

muon

−1 e

Updated on March 16, 2020

Comments

  • muon
    muon about 4 years

    I am trying to merge two dataframes on date column (tried both as type object or datetime.date, but fails to give desired merge output:

    import pandas as pd
    df1 =  pd.DataFrame({'amt': {0: 1549367.9496070854,
          1: 2175801.78219801,
          2: 1915613.1629125737,
          3: 1703063.8323954903,
          4: 1770040.7987461537},
         'month': {0: '2015-02-01',
          1: '2015-03-01',
          2: '2015-04-01',
          3: '2015-05-01',
          4: '2015-06-01'}})
    print(df1)
    
    
            amt             month
        0   1.549368e+06    2015-02-01
        1   2.175802e+06    2015-03-01
        2   1.915613e+06    2015-04-01
        3   1.703064e+06    2015-05-01
        4   1.770041e+06    2015-06-01
    
    
    
    df2 =  {'factor': {datetime.date(2015, 2, 1): 1.0,
          datetime.date(2015, 3, 1): 1.0,
          datetime.date(2015, 4, 1): 1.0,
          datetime.date(2015, 5, 1): 1.0,
          datetime.date(2015, 6, 1): 0.99889679025914435},
         'month': {datetime.date(2015, 2, 1): datetime.date(2015, 2, 1),
          datetime.date(2015, 3, 1): datetime.date(2015, 3, 1),
          datetime.date(2015, 4, 1): datetime.date(2015, 4, 1),
          datetime.date(2015, 5, 1): datetime.date(2015, 5, 1),
          datetime.date(2015, 6, 1): datetime.date(2015, 6, 1)}}
    df2 = pd.DataFrame(df2)
    print(df2)
    
                    factor      month
        2015-02-01  1.000000    2015-02-01
        2015-03-01  1.000000    2015-03-01
        2015-04-01  1.000000    2015-04-01
        2015-05-01  1.000000    2015-05-01
        2015-06-01  0.998897    2015-06-01
    
    
    pd.merge(df2, df1, how='outer', on='month')
    
            factor       month            amt
        0   1.000000     2015-02-01      NaN
        1   1.000000     2015-03-01      NaN
        2   1.000000     2015-04-01      NaN
        3   1.000000     2015-05-01      NaN
        4   0.998897     2015-06-01      NaN
        5   NaN           2015-02-01    1.549368e+06
        6   NaN           2015-03-01    2.175802e+06
        7   NaN           2015-04-01    1.915613e+06
        8   NaN           2015-05-01    1.703064e+06
        9   NaN           2015-06-01    1.770041e+06
    
  • jezrael
    jezrael about 7 years
    There is problem you cannot match string column with date column - need same dtypes.
  • muon
    muon about 7 years
    but I get df1.month.dtype = dtype('O') and df2.month.dtype = dtype('O'), so if both are string types then why would it matter
  • jezrael
    jezrael about 7 years
    yes, you are right, because date, list, set, str all are object - see here
  • muon
    muon about 7 years
    conversion to str did not help, conversion to pd.to_datetime() did
  • jezrael
    jezrael about 7 years
    and it is a bit confuse - dtype is object, but type not
  • Chris8447
    Chris8447 about 4 years
    I had the same issue...converting it both to a datetime object did it