JOIN two dataframes on common column in python
60,292
Solution 1
Use merge
:
print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1))
id name count price rating
0 1 a 10 100.0 1.0
1 2 b 20 200.0 2.0
2 3 c 30 300.0 3.0
3 4 d 40 NaN NaN
4 5 e 50 500.0 5.0
Another solution is simple rename column:
print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id', how='left'))
id name count price rating
0 1 a 10 100.0 1.0
1 2 b 20 200.0 2.0
2 3 c 30 300.0 3.0
3 4 d 40 NaN NaN
4 5 e 50 500.0 5.0
If need only column price
the simpliest is map
:
df1['price'] = df1.id.map(df2.set_index('id1')['price'])
print (df1)
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0
Another 2 solutions:
print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left')
.drop(['id1', 'rating'], axis=1))
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0
print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left')
.drop('id1', axis=1))
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0
Solution 2
join
utilizes the index to merge on unless we specify a column to use instead. However, we can only specify a column instead of the index for the 'left'
dataframe.
Strategy:
-
set_index
ondf2
to beid1
- use
join
withdf
as the left dataframe andid
as theon
parameter. Note that I could haveset_index('id')
ondf
to avoid having to use theon
parameter. However, this allowed me leave the column in the dataframe rather than having to reset_index later.
df.join(df2.set_index('id1'), on='id')
id name count price rating
0 1 a 10 100.0 1.0
1 2 b 20 200.0 2.0
2 3 c 30 300.0 3.0
3 4 d 40 NaN NaN
4 5 e 50 500.0 5.0
If you only want price
from df2
df.join(df2.set_index('id1')[['price']], on='id')
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0
Author by
Shubham R
Updated on July 25, 2022Comments
-
Shubham R almost 2 years
I have a dataframe df:
id name count 1 a 10 2 b 20 3 c 30 4 d 40 5 e 50
Here I have another dataframe df2:
id1 price rating 1 100 1.0 2 200 2.0 3 300 3.0 5 500 5.0
I want to join these two dataframes on column id and id1(both refer same). Here is an example of df3:
id name count price rating 1 a 10 100 1.0 2 b 20 200 2.0 3 c 30 300 3.0 4 d 40 Nan Nan 5 e 50 500 5.0
Should I use df.merge or pd.concat?
-
Shubham R over 7 yearskeeping this answer, if from df2 i have to select only 1 column 'price' then?
-
Shubham R over 7 yearskeeping this answer, if from df2 i have to select only 1 column 'price' then?
-
jezrael over 7 yearsI dont understand, can you explain more?
-
Shubham R over 7 yearsfinal table has
id name count id1 price rating
but i want onlyprice
from df2 notrating
, then? -
Shubham R over 7 yearsthe 2 ways that you suggested both are correct,right?
-
jezrael over 7 yearsYes, all solutions are correct. If need add more column, nicer and better is
join
(not necessary delete column, left join by default), but if need add only one columnmap
is faster. -
Shubham R over 7 yearsfor two huge dataframe(say each df has around 4 million rows, should i use merge or map? which one would take less time to complete
-
jezrael over 7 yearsdo you need add one or 2 columns?
-
Shubham R over 7 yearsjust a question, my requirements may change, sometimes one column sometimes 2 columns. which one would be best in both the cases
-
jezrael over 7 yearshmmm, I try test it, but
join
ormerge
is more universal. -
jezrael over 7 yearsI try test if
join
ormerge
is faster.