how to filter out words in python?
15,799
Solution 1
You're looping over all lines for each word and appending the replaces. You should switch those loops:
item1 = []
for line in item:
for w in words:
line = line.replace(w, '')
item1.append(line)
Note: I altered some code
- changed
gg
toline
- changed
it
toitem
- removed the check if
line
containsw
as that is handled byreplace
replace
does not know about word boundries. If you want to remove entire words only, you should try a different approach. Using re.sub
import re
item1 = []
for line in item:
for w in words:
line = re.sub(r'\b%s\b' % w, '', line) # '\b' is a word boundry
item1.append(line)
Solution 2
You might use this approach instead:
item =['the dog is gone', 'the dog and cat is gone']
words= ['dog','cat']
item2 = [" ".join([w for w in t.split() if not w in words]) for t in item]
print item2
>>> ['the is gone', 'the and is gone']
Author by
user1753878
Updated on June 13, 2022Comments
-
user1753878 almost 2 years
For example:
item =['the dog is gone', 'the dog and cat is gone'] words= ['dog','cat']
I want to be able to filter out the
dog
andcat
so it would read:item=['the is gone', 'the and is gone']
item1=[] for w in words: for line in item: if w in line: j=gg.replace(it,'') item1.append(j)
I get the following:
['the is gone', 'the cat and is gone', 'the and dog is gone']