Better way to remove multiple words from a string?

12,722

Solution 1

Here's a solution with regex:

import re
    
def RemoveBannedWords(toPrint,database):
    statement = toPrint
    pattern = re.compile("\\b(Good|Bad|Ugly)\\W", re.I)
    return pattern.sub("", toPrint)
    
toPrint = "Hello Ugly Guy, Good To See You."
    
print(RemoveBannedWords(toPrint,bannedWord))

Solution 2

I use

bannedWord = ['Good','Bad','Ugly']
toPrint = 'Hello Ugly Guy, Good To See You.'
print(' '.join(i for i in toPrint.split() if i not in bannedWord))

Solution 3

Slight variation on Ajay's code, when one of the string is a substring of other in the bannedWord list

bannedWord = ['good', 'bad', 'good guy' 'ugly']

The result of toPrint ='good winter good guy' would be

RemoveBannedWords(toPrint,database = bannedWord) = 'winter good'

as it will remove good first. A sorting is required wrt length of elements in the list.

import re

def RemoveBannedWords(toPrint,database):
    statement = toPrint
    database_1 = sorted(list(database), key=len)
    pattern = re.compile(r"\b(" + "|".join(database_1) + ")\\W", re.I)
    return pattern.sub("", toPrint + ' ')[:-1] #added because it skipped last word

toPrint = 'good winter good guy.'

print(RemoveBannedWords(toPrint,bannedWord))

Solution 4

Yet another variation on a theme. If you are going to be calling this a lot, then it is best to compile the regex once to improve the speed:

import re

bannedWord = ['Good', 'Bad', 'Ugly']
re_banned_words = re.compile(r"\b(" + "|".join(bannedWord) + ")\\W", re.I)

def RemoveBannedWords(toPrint):
    global re_banned_words
    return re_banned_words.sub("", toPrint)

toPrint = 'Hello Ugly Guy, Good To See You.'
print(RemoveBannedWords(toPrint))
Share:
12,722
Andy Wong
Author by

Andy Wong

Independently learning Python3 and Swift

Updated on July 27, 2022

Comments

  • Andy Wong
    Andy Wong almost 2 years
    bannedWord = ["Good", "Bad", "Ugly"]
        
    def RemoveBannedWords(toPrint, database):
        statement = toPrint
        for x in range(0, len(database)):
            if bannedWord[x] in statement:
                statement = statement.replace(bannedWord[x] + " ", "")
        return statement
            
    toPrint = "Hello Ugly Guy, Good To See You."
        
    print(RemoveBannedWords(toPrint, bannedWord))
    

    The output is Hello Guy, To See You. Knowing Python I feel like there is a better way to implement changing several words in a string. I searched up some similar solutions using dictionaries but it didn't seem to fit this situation.

  • questionto42standswithUkraine
    questionto42standswithUkraine over 3 years
    Best answer, strange that it has so few votes. Add a star "*" to the \\W if you need to find embedded words: re.compile(r"\b(" + "|".join(list_not_for_search) + ")\\W*", re.I). Like in 'Hello uglyyy guy, good do see you.' which will exclude the 'ugly' and give out the 'yy' as the rest. By the way: re.I stands for re.IGNORECASE.