Replace multiple substrings in a Pandas series with a value

22,646

Solution 1

You can perform this task by forming a |-separated string. This works because pd.Series.str.replace accepts regex:

Replace occurrences of pattern/regex in the Series/Index with some other string. Equivalent to str.replace() or re.sub().

This avoids the need to create a dictionary.

import pandas as pd

df = pd.DataFrame({'A': ['LOCAL TEST', 'TEST FOREIGN', 'ANOTHER HELLO', 'NOTHING']})

pattern = '|'.join(['LOCAL', 'FOREIGN', 'HELLO'])

df['A'] = df['A'].str.replace(pattern, 'CORP')

#               A
# 0     CORP TEST
# 1     TEST CORP
# 2  ANOTHER CORP
# 3       NOTHING

Solution 2

replace can accept dict , os we just create a dict for those values need to be replaced

dataUS['sec_type'].str.strip().replace(dict(zip(["LOCAL", "FOREIGN", "HELLO"], ["CORP"]*3)),regex=True)

Info of the dict

dict(zip(["LOCAL", "FOREIGN", "HELLO"], ["CORP"]*3))
Out[585]: {'FOREIGN': 'CORP', 'HELLO': 'CORP', 'LOCAL': 'CORP'}

The reason why you receive the error ,

str.replace is different from replace

Solution 3

The answer of @Rakesh is very neat but does not allow for substrings. With a small change however, it does.

  1. Use a replacement dictionary because it makes it much more generic
  2. Add the keyword argument regex=True to Series.replace() (not Series.str.replace) This does two things actually: It changes your replacement to regex replacement, which is much more powerful but you will have to escape special characters. Beware for that. Secondly it will make the replace work on substrings instead of the entire string. Which is really cool!
replacement = {
    "LOCAL": "CORP",
    "FOREIGN": "CORP",
    "HELLO": "CORP"
}

dataUS['sec_type'].replace(replacement, regex=True)

Full code example

dataUS = pd.DataFrame({'sec_type': ['LOCAL', 'Sample text LOCAL', 'Sample text LOCAL sample FOREIGN']})

replacement = {
    "LOCAL": "CORP",
    "FOREIGN": "CORP",
    "HELLO": "CORP"
}

dataUS['sec_type'].replace(replacement, regex=True)

Output

0                            CORP
1                            CORP
2                Sample text CORP
3    Sample text CORP sample CORP
Name: sec_type, dtype: object

Solution 4

Function to replace multiple values in pandas Series:

def replace_values(series, to_replace, value):
    for i in to_replace:
        series = series.str.replace(i, value)
    return series

Hope this helps someone

Solution 5

Try:

dataUS.replace({"sec_type": { 'LOCAL' : "CORP", 'FOREIGN' : "CORP"}})
Share:
22,646
SBad
Author by

SBad

Updated on January 27, 2022

Comments

  • SBad
    SBad over 2 years

    All,

    To replace one string in one particular column I have done this and it worked fine:

    dataUS['sec_type'].str.strip().str.replace("LOCAL","CORP")
    

    I would like now to replace multiple strings with one string say replace ["LOCAL", "FOREIGN", "HELLO"] with "CORP"

    How can make it work? the code below didn't work

    dataUS['sec_type'].str.strip().str.replace(["LOCAL", "FOREIGN", "HELLO"], "CORP")