how to filter out words in python?

15,799

Solution 1

You're looping over all lines for each word and appending the replaces. You should switch those loops:

item1 = [] 
for line in item:
    for w in words:
        line = line.replace(w, '')
    item1.append(line)

Note: I altered some code

  • changed gg to line
  • changed it to item
  • removed the check if line contains w as that is handled by replace

replace does not know about word boundries. If you want to remove entire words only, you should try a different approach. Using re.sub

import re

item1 = [] 
for line in item:
    for w in words:
        line = re.sub(r'\b%s\b' % w, '', line)  # '\b' is a word boundry
    item1.append(line)

Solution 2

You might use this approach instead:

item =['the dog is gone', 'the dog and cat is gone']
words= ['dog','cat'] 

item2 = [" ".join([w for w in t.split() if not w in words]) for t in item]

print item2

>>> ['the is gone', 'the and is gone']
Share:
15,799
user1753878
Author by

user1753878

Updated on June 13, 2022

Comments

  • user1753878
    user1753878 almost 2 years

    For example:

    item =['the dog is gone', 'the dog and cat is gone']
    words= ['dog','cat'] 
    

    I want to be able to filter out the dog and cat so it would read:

    item=['the  is gone', 'the   and  is gone']
    

    item1=[] 
    for w in words:
       for line in item:
          if w in line:
             j=gg.replace(it,'')
             item1.append(j)
    

    I get the following:

    ['the  is gone', 'the cat and  is gone', 'the  and dog is gone']