Python word_tokenize

14,282

Your issue is that you're trying to run nltk.download() in a script and the GUI is appearing hidden somewhere behind your pages.

Generally, nltk.download() is usually ran in the Python interpreter, it allows you to download various datasets and corpuses (corpii? :P) for use with nltk. You usually only have to do this once, only using it again if you want to update your corpuses. You don't have to run it every single time you run a script.

Assuming you've ran nltk.download() in the Python interpreter then you will either get some form of GUI, or if you're not able to have access to GUIs (for example if you're SSHd in without X-forwarding) then it'll be a command line interface. You can use this to download the data. I'd recommend just downloading it all, unless you're stretched for space.

Once you've ran nltk.download() and downloaded everything you think you'll need, then the code below should work.

import nltk
import os

os.getcwd()
text_file=open(r"ecelebi\1.txt","r")

p = text_file.read()
words = nltk.tokenize.word_tokenize(p)

fdist= nltk.FreqDist(words)
print(fdist)

Note that the command is nltk.FreqDist, not FreqDist, because the function is in the nltk namespace.

Share:
14,282
Admin
Author by

Admin

Updated on July 12, 2022

Comments

  • Admin
    Admin almost 2 years

    I'm quite new in python. I'm trying to find Frequency Distributions of my text. Here is the code,

    import nltk
    nltk.download()
    import os
    os.getcwd()
    text_file=open(r"ecelebi\1.txt","r")
    p = text_file.read()
    words = nltk.tokenize.word_tokenize(p)
    fdist= FreqDist(words)
    print(fdist)
    

    The problem is that program is not giving any error or solution. It is just returning this

    >>> ================================ RESTART ================================
    >>> 
    showing info http://nltk.github.com/nltk_data/
    

    I think the problem is with word_tokenize(). I would appreciate, if you can help. Thank you.