WordListCorpusReader is not iterable

20,405

When you did

from nltk.corpus import stopwords

stopwords is the variable that's pointing to the CorpusReader object in nltk.

The actual stopwords (i.e. a list of stopwords) you're looking for is instantiated when you do:

stop_words = set(stopwords.words("english"))

So when checking whether a word in your list of tokens is a stopwords, you should do:

from nltk.corpus import stopwords
stop_words = set(stopwords.words("english"))
for w in tokenized_sent:
    if w not in stop_words:
        pass # Do something.

To avoid confusion, I usually name the actual list of stopwords as stoplist:

from nltk.corpus import stopwords
stoplist = set(stopwords.words("english"))
for w in tokenized_sent:
    if w not in stoplist:
        pass # Do something.
Share:
20,405
Aarushi Aiyyar
Author by

Aarushi Aiyyar

Updated on October 30, 2020

Comments

  • Aarushi Aiyyar
    Aarushi Aiyyar over 3 years

    So, I am new to using Python and NLTK. I have a file called reviews.csv which consists of comments extracted from amazon. I have tokenized the contents of this csv file and written it to a file called csvfile.csv. Here's the code :

    from nltk.tokenize import sent_tokenize, word_tokenize
    from nltk.stem import PorterStemmer
    import csv #CommaSpaceVariable
    from nltk.corpus import stopwords
    ps = PorterStemmer()
    stop_words = set(stopwords.words("english"))
    with open ('reviews.csv') as csvfile:
        readCSV = csv.reader(csvfile,delimiter='.')    
        for lines in readCSV:
            word1 = word_tokenize(str(lines))
            print(word1)
        with open('csvfile.csv','a') as file:
            for word in word1:
                file.write(word)
                file.write('\n')
        with open ('csvfile.csv') as csvfile:
            readCSV1 = csv.reader(csvfile)
        for w in readCSV1:
            if w not in stopwords:
                print(w)
    

    I am trying to perform stemming on csvfile.csv. But I get this error:

      Traceback (most recent call last):<br>
      File "/home/aarushi/test.py", line 25, in <module> <br>
       if w not in stopwords: <br>
      TypeError: argument of type 'WordListCorpusReader' is not iterable
    
  • Aarushi Aiyyar
    Aarushi Aiyyar over 6 years
    Thank you! I modified stopwords to stop_words but I get an error saying: if w not in stop_words: TypeError: unhashable type: 'list'
  • Mohammad Heydari
    Mohammad Heydari almost 5 years
    @alvas, i face with this erro : NameError: name 'tokenized_sent' is not defined