Pandas: replace column values based on match from another column
Solution 1
Use map
All the logic you need:
def update_type(t1, t2, dropna=False):
return t1.map(t2).dropna() if dropna else t1.map(t2).fillna(t1)
Let's make 'ItemType2'
the index of Dataframe2
update_type(Dataframe1.ItemType1,
Dataframe2.set_index('ItemType2').newType)
0 Tomato
1 Potato
2 Potato
3 greenCauliflower
4 yellowCauliflower
5 Squash
6 Onions
7 Onions
8 Onions
9 yellowCabbage
10 GreenCabbage
Name: ItemType1, dtype: object
update_type(Dataframe1.ItemType1,
Dataframe2.set_index('ItemType2').newType,
dropna=True)
0 Tomato
1 Potato
2 Potato
3 greenCauliflower
4 yellowCauliflower
5 Squash
6 Onions
7 Onions
8 Onions
Name: ItemType1, dtype: object
Verify
updated = update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType)
pd.concat([Dataframe1, updated], axis=1, keys=['old', 'new'])
Timing
def root(Dataframe1, Dataframe2):
return Dataframe1['ItemType1'].replace(Dataframe2.set_index('ItemType2')['newType'].dropna())
def piRSquared(Dataframe1, Dataframe2):
t1 = Dataframe1.ItemType1
t2 = Dataframe2.set_index('ItemType2').newType
return update_type(t1, t2)
Solution 2
You can convert df2
into a Series indexed by 'ItemType2'
, and then use replace
on df1
:
# Make df2 a Series indexed by 'ItemType'.
df2 = df2.set_index('ItemType2')['newType'].dropna()
# Replace values in df1.
df1['ItemType1'] = df1['ItemType1'].replace(df2)
Or in a single line, if you don't want to alter df2
:
df1['ItemType1'] = df1['ItemType1'].replace(df2.set_index('ItemType2')['newType'].dropna())
Solution 3
This method requires you set your column names to 'type', then you can set off using merge and np.where
df3 = df1.merge(df2,how='inner',on='type')['type','newType']
df3['newType'] = np.where(df['newType'].isnull(),df['type'],df['newType'])
Related videos on Youtube
Anil_M
I love to solve problems, especially challenging one's , Help others and learn something new along the way. I believe SO is a great platform to learn and share knowledge.
Updated on September 15, 2022Comments
-
Anil_M about 1 year
I've a column in first data-frame
df1["ItemType"]
as below,Dataframe1
ItemType1 redTomato whitePotato yellowPotato greenCauliflower yellowCauliflower yelloSquash redOnions YellowOnions WhiteOnions yellowCabbage GreenCabbage
I need to replace that based on a dictionary created from another data-frame.
Dataframe2
ItemType2 newType whitePotato Potato yellowPotato Potato redTomato Tomato yellowCabbage GreenCabbage yellowCauliflower yellowCauliflower greenCauliflower greenCauliflower YellowOnions Onions WhiteOnions Onions yelloSquash Squash redOnions Onions
Notice that,
- In
dataframe2
some of theItemType
are same asItemType
indataframe1
. - Some
ItemType
in dataframe2 havenull
values like yellowCabbage. -
ItemType
in dataframe2 are out of order with respect toItemType
indataframe
I need to replace values in
Dataframe1
ItemType
column if there is a match for value in the correspondingDataframe2
ItemType
withnewType
keeping above exceptions listed in bullet-points in mind.
If there is no match, then values needs to be as they are [ no change].So far I got is.
import pandas as pd #read second `csv-file` df2 = pd.read_csv('mappings.csv',names = ["ItemType", "newType"]) #conver to dict df2=df2.set_index('ItemType').T.to_dict('list')
Below given replace on match are not working. They are inserting
NaN
values instead of actual. These are based on discussion here on SO.df1.loc[df1['ItemType'].isin(df2['ItemType'])]=df2[['NewType']]
OR
df1['ItemType']=df2['ItemType'].map(df2)
Thanks in advance
EDIT
Two column headers in both data frames have different names. So dataframe1 column on is ItemType1 and first column in second data-frame is ItemType2. Missed that on first edit. - In
-
Anil_M over 7 yearsHi - Thanks for quick replay. I have done slight changes to question. Two column headers in both data frames have different names. So dataframe1 column on is
ItemType1
and first column in second dataframe isItemType2
. Also , above solution is giving error asKeyError: 'type'
-
draco_alpine over 7 yearsthe error on 'type' and the issues with ItemType1 and ItemType2 are one and the same. Specifically, I'm trying to join on 'type' when actually neither df has column 'type' instead they have 'ItemType1' and 'ItemType2'. Personally I would rename the columns to ItemType in both df's and proceed. But the other solutions offered may be more suitable to your specific needs.
-
mechanical_meat over 7 yearsHooray for time measurements... +1
-
Anil_M over 7 yearsHi- thx for response, still checking out. The solution also needs to omit/drop
ItemType1
items for which corresponding values areempty
,null
in second dataframenewType
columns. So here it should dropyellowCabbage
andGreenCabbage
such that they shouldnt appear in the final table. -
piRSquared over 7 years@Anil_M that's easy, I forcibly put that back in. Updated post with optional dropna paramter.
-
Anil_M over 7 years@piRSquared - I am getting
NameError: global name 'dropna' is not defined
forreturn t1.map(t2).dropna() if dropna else t1.map(t2).fillna(t1)
-
piRSquared over 7 years@Anil_M did you put the
dropna=False
in the signature of the function definition? -
Anil_M over 7 yearsYes, that fixed it.
-
bernando_vialli about 6 years2 layer question here. Firstly, when I am trying to run this now on 1 column, I get a MemoryError, what can be done about that. Second question, I am trying to use this in the work I am doing now but I need something more complicated. I want to apply the one column to match to a huge dataframe with a bunch of columns (about 100) and rows. How would I modify the code to achieve that?