Join pandas dataframes based on column values

39,896

I think you need merge with default inner join, but is necessary no duplicated combinations of values in both columns:

print (df2)
   item_id  document_id col1  col2  col3
0      337           10    s     4     7
1     1002           11    d     5     8
2     1003           11    f     7     0

df = pd.merge(df1, df2, on=['document_id','item_id'])
print (df)
   item_id  position  document_id col1  col2  col3
0      337         2           10    s     4     7
1     1002         2           11    d     5     8
2     1003         3           11    f     7     0

But if necessary position column in position 3:

df = pd.merge(df2, df1, on=['document_id','item_id'])
cols = df.columns.tolist()
df = df[cols[:2] + cols[-1:] + cols[2:-1]]
print (df)
   item_id  document_id  position col1  col2  col3
0      337           10         2    s     4     7
1     1002           11         2    d     5     8
2     1003           11         3    f     7     0
Share:
39,896
fremorie
Author by

fremorie

Updated on July 09, 2022

Comments

  • fremorie
    fremorie almost 2 years

    I'm quite new to pandas dataframes, and I'm experiencing some troubles joining two tables.

    The first df has just 3 columns:

    DF1:
    item_id    position    document_id
    336        1           10
    337        2           10
    338        3           10
    1001       1           11
    1002       2           11
    1003       3           11
    38         10          146
    

    And the second has exactly same two columns (and plenty of others):

    DF2
    item_id    document_id    col1    col2   col3    ...
    337        10             ...     ...    ...
    1002       11             ...     ...    ...
    1003       11             ...     ...    ...
    

    What I need is to perform an operation which, in SQL, would look as follows:

    DF1 join DF2 on 
    DF1.document_id = DF2.document_id
    and
    DF1.item_id = DF2.item_id
    

    And, as a result, I want to see DF2, complemented with column 'position':

    item_id    document_id    position    col1   col2   col3   ...
    

    What is a good way to do this using pandas?

    Thank you!