Use dictionary to replace a string within a string in Pandas columns
Solution 1
You can create dictionary
and then replace
:
ids = {'Id':['NYC','LA','UK'],
'City':['New York City','Los Angeles','United Kingdom']}
ids = dict(zip(ids['Id'], ids['City']))
print (ids)
{'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City'}
df['commentTest'] = df['Comment'].replace(ids, regex=True)
print (df)
Categories Comment Type \
0 animal The NYC tree is very big tree
1 plant The cat from the UK is small dog
2 object The rock was found in LA. rock
commentTest
0 The New York City tree is very big
1 The cat from the United Kingdom is small
2 The rock was found in Los Angeles.
Solution 2
It's actually much faster to use str.replace()
than replace()
, even though str.replace()
requires a loop:
ids = {'NYC': 'New York City', 'LA': 'Los Angeles', 'UK': 'United Kingdom'}
for old, new in ids.items():
df['Comment'] = df['Comment'].str.replace(old, new, regex=False)
# Categories Type Comment
# 0 animal tree The New York City tree is very big
# 1 plant dog The cat from the United Kingdom is small
# 2 object rock The rock was found in Los Angeles
The only time replace()
outperforms a str.replace()
loop is with small dataframes:
The timing functions for reference:
def Series_replace(df):
df['Comment'] = df['Comment'].replace(ids, regex=True)
return df
def Series_str_replace(df):
for old, new in ids.items():
df['Comment'] = df['Comment'].str.replace(old, new, regex=False)
return df
Note that if ids
is a dataframe instead of dictionary, you can get the same performance with itertuples()
:
ids = pd.DataFrame({'Id': ['NYC', 'LA', 'UK'], 'City': ['New York City', 'Los Angeles', 'United Kingdom']})
for row in ids.itertuples():
df['Comment'] = df['Comment'].str.replace(row.Id, row.City, regex=False)
owwoow14
Updated on June 07, 2022Comments
-
owwoow14 almost 2 years
I am trying to use a
dictionary
key
to replacestrings
in apandas
column with itsvalues
. However, each column contains sentences. Therefore, I must first tokenize the sentences and detect whether a Word in the sentence corresponds with a key in my dictionary, then replace the string with the corresponding value.However, the result that I continue to get it none. Is there a better pythonic way to approach this problem?
Here is my MVC for the moment. In the comments, I specified where the issue is happening.
import pandas as pd data = {'Categories': ['animal','plant','object'], 'Type': ['tree','dog','rock'], 'Comment': ['The NYC tree is very big','The cat from the UK is small','The rock was found in LA.'] } ids = {'Id':['NYC','LA','UK'], 'City':['New York City','Los Angeles','United Kingdom']} df = pd.DataFrame(data) ids = pd.DataFrame(ids) def col2dict(ids): data = ids[['Id', 'City']] idDict = data.set_index('Id').to_dict()['City'] return idDict def replaceIds(data,idDict): ids = idDict.keys() types = idDict.values() data['commentTest'] = data['Comment'] words = data['commentTest'].apply(lambda x: x.split()) for (i,word) in enumerate(words): #Here we can see that the words appear print word print ids if word in ids: #Here we can see that they are not being recognized. What happened? print ids print word words[i] = idDict[word] data['commentTest'] = ' '.apply(lambda x: ''.join(x)) return data idDict = col2dict(ids) results = replaceIds(df, idDict)
Results:
None
I am using
python2.7
and when I am printing out thedict
, there areu'
of Unicode.My expected outcome is:
Categories
Comment
Type
commentTest
Categories Comment Type commentTest 0 animal The NYC tree is very big tree The New York City tree is very big 1 plant The cat from the UK is small dog The cat from the United Kingdom is small 2 object The rock was found in LA. rock The rock was found in Los Angeles.