Adding words to nltk stoplist

51,100

Solution 1

You can simply use the append method to add words to it:

stopwords = nltk.corpus.stopwords.words('english')
stopwords.append('newWord')

or extend to append a list of words, as suggested by Charlie on the comments.

stopwords = nltk.corpus.stopwords.words('english')
newStopWords = ['stopWord1','stopWord2']
stopwords.extend(newStopWords)

Solution 2

import nltk
stopwords = nltk.corpus.stopwords.words('english')
new_words=('re','name', 'user', 'ct')
for i in new_words:
    stopwords.append(i)
print(stopwords)

Solution 3

The way how I did on my Ubuntu machine was, I ctrl + F for "stopwords" in root. It gave me a folder. I stepped inside it which had different files. I opened "english" which had barely 128 words. Added my words to it. Saved and done.

Solution 4

I always do stopset = set(nltk.corpus.stopwords.words('english')) at the top of any module that needs it. Then it's easy to add more words to the set, plus membership checks are faster.

Solution 5

The english stop words is a file within nltk/corpus/stopwords/english.txt (I guess it would be here...i dont have nltk on this machine..best thing would be to search 'english.txt within nltk repo)

You can just add your new stop words in this file.

also try looking at bloom filters if your stop word list increases to few hundreds

Share:
51,100
Alex
Author by

Alex

Updated on July 09, 2022

Comments

  • Alex
    Alex almost 2 years

    I have some code that removes stop words from my data set, as the stop list doesn't seem to remove a majority of the words I would like it too, I'm looking to add words to this stop list so that it will remove them for this case. The code i'm using to remove stop words is:

    word_list2 = [w.strip() for w in word_list if w.strip() not in nltk.corpus.stopwords.words('english')]
    

    I'm unsure of the correct syntax for adding words and can't seem to find the correct one anywhere. Any help is appreciated. Thanks.