NLTK Stopword List

python nltk stop-words

44,272

A few things of note.

If you are going to be checking membership against a list over and over, I would use a set instead of a list.
stopwords.words('english') returns a list of lowercase stop words. It is quite likely that your source has capital letters in it and is not matching for that reason.
You aren't reading the file properly, you are checking over the file object not a list of the words split by spaces.

Putting it all together:

import nltk
from nltk.corpus import stopwords

word_list = open("xxx.y.txt", "r")
stops = set(stopwords.words('english'))

for line in word_list:
    for w in line.split():
        if w.lower() not in stops:
            print w

44,272

Author by

saph_top

Updated on January 09, 2020

Comments

saph_top over 4 years
I have the code beneath and I am trying to apply a stop word list to list of words. However the results still show words such as "a" and "the" which I thought would have been removed by this process. Any ideas what has gone wrong would be great .
```
import nltk
from nltk.corpus import stopwords

word_list = open("xxx.y.txt", "r")
filtered_words = [w for w in word_list if not w in stopwords.words('english')]
print filtered_words
```
Hooked about 10 years

Note that you still aren't filtering for punctuation, you'll want to remove things like ';"{}[]/?.,! for example.
saph_top about 10 years

brilliant that worked, must have been reading over the file incorrectly, thanks.