JOIN two dataframes on common column in python

python pandas join

60,292

Solution 1

Use merge:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1))
   id name  count  price  rating
0   1    a     10  100.0     1.0
1   2    b     20  200.0     2.0
2   3    c     30  300.0     3.0
3   4    d     40    NaN     NaN
4   5    e     50  500.0     5.0

Another solution is simple rename column:

print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id',  how='left'))
   id name  count  price  rating
0   1    a     10  100.0     1.0
1   2    b     20  200.0     2.0
2   3    c     30  300.0     3.0
3   4    d     40    NaN     NaN
4   5    e     50  500.0     5.0

If need only column price the simpliest is map:

df1['price'] = df1.id.map(df2.set_index('id1')['price'])
print (df1)
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

Another 2 solutions:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left')
         .drop(['id1', 'rating'], axis=1))
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left')
         .drop('id1', axis=1))
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

Solution 2

join utilizes the index to merge on unless we specify a column to use instead. However, we can only specify a column instead of the index for the 'left' dataframe.

Strategy:

set_index on df2 to be id1
use join with df as the left dataframe and id as the on parameter. Note that I could have set_index('id') on df to avoid having to use the on parameter. However, this allowed me leave the column in the dataframe rather than having to reset_index later.

df.join(df2.set_index('id1'), on='id')

   id name  count  price  rating
0   1    a     10  100.0     1.0
1   2    b     20  200.0     2.0
2   3    c     30  300.0     3.0
3   4    d     40    NaN     NaN
4   5    e     50  500.0     5.0

If you only want price from df2

df.join(df2.set_index('id1')[['price']], on='id')


   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

60,292

Author by

Shubham R

Updated on July 25, 2022

Comments

Shubham R almost 2 years

I have a dataframe df:

id   name   count
1    a       10
2    b       20
3    c       30
4    d       40
5    e       50

Here I have another dataframe df2:

id1  price   rating
 1     100     1.0
 2     200     2.0
 3     300     3.0
 5     500     5.0

I want to join these two dataframes on column id and id1(both refer same). Here is an example of df3:

id   name   count   price   rating
1    a       10      100      1.0
2    b       20      200      2.0
3    c       30      300      3.0
4    d       40      Nan      Nan
5    e       50      500      5.0

Should I use df.merge or pd.concat?

Shubham R over 7 years

keeping this answer, if from df2 i have to select only 1 column 'price' then?
Shubham R over 7 years

keeping this answer, if from df2 i have to select only 1 column 'price' then?
jezrael over 7 years

I dont understand, can you explain more?
Shubham R over 7 years

final table has id name count id1 price rating but i want only price from df2 not rating, then?
Shubham R over 7 years

the 2 ways that you suggested both are correct,right?
jezrael over 7 years

Yes, all solutions are correct. If need add more column, nicer and better is join (not necessary delete column, left join by default), but if need add only one column map is faster.
Shubham R over 7 years

for two huge dataframe(say each df has around 4 million rows, should i use merge or map? which one would take less time to complete
jezrael over 7 years

do you need add one or 2 columns?
Shubham R over 7 years

just a question, my requirements may change, sometimes one column sometimes 2 columns. which one would be best in both the cases
jezrael over 7 years

hmmm, I try test it, but join or merge is more universal.
jezrael over 7 years

I try test if join or merge is faster.