Merge DataFrames with Matching Values From Two Different Columns - Pandas

35,352

Use how='inner' in pd.merge:

merged_df = DF2.merge(DF1, how = 'inner', on = ['date', 'hours'])

This will perform and "inner-join" thereby omitting rows in each dataframe that do not match. Hence, no NaN in either the right or left part of merged dataframe.

Share:
35,352
i.n.n.m
Author by

i.n.n.m

I am appreciative and grateful for open source contributors, you ROCK!! Keep up the good work :)

Updated on July 23, 2022

Comments

  • i.n.n.m
    i.n.n.m almost 2 years

    I have two different DataFrames that I want to merge with date and hours columns. I saw some threads that are there, but I could not find the solution for my issue. I also read this document and tried different combinations, however, did not work well.

    Example of my two different DataFrames,

    DF1

            date    hours        var1            var2 
    0   2013-07-10  00:00:00    150.322617  52.225920   
    1   2013-07-10  01:00:00    155.250917  53.365296   
    2   2013-07-10  02:00:00    124.918667  51.158249   
    3   2013-07-10  03:00:00    143.839217  53.138251
     .....  
    9   2013-09-10  09:00:00    148.135818  86.676341
    10  2013-09-10  10:00:00    147.833517  53.658016   
    11  2013-09-10  12:00:00    149.580233  69.745368   
    12  2013-09-10  13:00:00    163.715317  14.524894   
    13  2013-09-10  14:00:00    168.856650  10.762779   
    

    DF2

           date      hours      myvar1        myvar2 
    0   2013-07-10  09:00:00    1.617         98.56 
    1   2013-07-10  10:00:00    2.917         23.60 
    2   2013-07-10  12:00:00    19.667        36.15 
    3   2013-07-10  13:00:00    14.217        45.16
     .....  
    20 2013-09-10   20:00:00    1.517         53.56 
    21 2013-09-10   21:00:00    5.233         69.47
    22 2013-09-10   22:00:00    13.717        14.25
    23 2013-09-10   23:00:00    18.850        10.69 
    

    As you can see in both DataFrames, DF2 starts with 09:00:00 and I want to join with DF1 09:00:00, which is basically the matchind dates and times. So far, I tried many different combination using previous threads and the documentation mentioned above. An example,

    merged_df = DF2.merge(DF1, how = 'left', on = ['date', 'hours'])
    

    This was introduces NAN values for right right DataFrame. I know, I do not have to use both date and hours columns, however, still getting the same result. I tried R quick like this, which works perfectly fine.

    merged_df  <- left_join(DF1, DF2, by = 'date')
    

    Is there anyway in pandas to merge DatFrames just with matching values without getting NaN values?