NLTK data out of date - Python 3.4

10,428

Solution 1

In short:

Don't use the GUI, add all packages within the python interpreter.

$ python3
>>> import nltk
>>> nltk.download('all')

In long:

It might be because of the recent addition of Open Multilingual WordNet and something is not working right with the NLTK download GUI interface and the indices.

Solution 1:

Simply use the nltk.download() GUI and download the two packages without selecting all. (May not work but worth the try)

Solution 2:

Install the package individually through the python interpreter:

>>> import nltk
>>> nltk.download('wordnet')
>>> nltk.download('omw') # Open Multilingual WordNet

Solution 3:

Let the nltk.download('all') check through all packages in its index and download them if they're not available.

>>> import nltk
>>> nltk.downlad('all')

Note: If any files was corrupted possibly due to broken internet connection, simply find the directory where NLTK data is stored and then proceed with solution 3.

To find where nltk_data is stored, nltk.data.path stores the possible locations:

>>> import nltk
>>> nltk.data.path
['/home/alvas/nltk_data', '/usr/share/nltk_data', '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']

Since the point of the data download is to use them, to know that you're not missing the components you need, and if that's wordnet and omw, you can try this:

>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('bank')[0]
Synset('bank.n.01')
>>> wn.synsets('bank')[0].lemma_names('spa')
['margen', 'orilla', 'vera']
>>> wn.synsets('bank')[0].lemma_names('fre')
['rive', 'banque']

Don't worry so much as in what is shown on the GUI. Once nltk.download('all') is completed without errors, it means you have all the corpora and models that NLTK supports.

But as a good practice, please raise an issue in https://github.com/nltk/nltk_data/issues so that the developers can check if the problem can be replicated. Show some more printscreen of the error. before and after the proposed solutions too =)

Solution 2

Don't worry about the "out of date" messages, it's a waste of your time. Just go ahead and use the nltk.

The NLTK's data resources are almost entirely independent of each other. You might never have reason to use either of the packages that are marked as "out of date", but even if you do, chances are they are in fact fully installed and usable.

Still, it's happened to me too and this is what I found: It seems that the downloader will consider a resource to be "out of date" if it detects files in its download folder that are not in the resource manifest. Perhaps this is sometimes caused by misconfigured resources, but if you've visited the resources in question with a directory browser, you may have caused the mismatch through stray files left behind by your GUI, or your editor, or who knows what. E.g., on a Mac the Finder will leave a .DS_Store file in directories it visits.

But as I said, the "problem" is not really worth fixing. Enjoy the NLTK!

PS. As far as I know, the best (and really only) way to refresh your nltk_data directory is to delete the whole thing and download again.

Share:
10,428
pyman
Author by

pyman

Updated on June 17, 2022

Comments

  • pyman
    pyman almost 2 years

    I'm trying to install NLTK for Python 3.4. The actual NLTK module appears to have installed fine. I then ran

    import nltk
    
    nltk.download()
    

    and chose to download everything. However, after it was done, the window simply says 'out of date'. I tried refreshing and downloading, yet it stays 'out of date' as shown here:NLTK Window 1

    I looked online and tried various fixes, but I haven't found any that helped my case yet.

    I also tried to manually find the missing parts, which turned out to be 'Open Multilingual Wordnet' and 'Wordnet'. Here's how I found which parts were missing: Open Multilingual Wordnet.

    What should I do? Should I uninstall and reinstall NLTK? I haven't really found a way to delete the packages (except for manually deleting it).

    EDIT: Regarding Solution 2 and Solution 3: For more clarification on the Solution 2 issue:

    If something has sucessfully downloaded, this is the output:

    >>> nltk.download('subjectivity')
    [nltk_data] Downloading package subjectivity to
    [nltk_data]     C:\Users\Shane\AppData\Roaming\nltk_data...
    [nltk_data]   Package subjectivity is already up-to-date!
    True
    

    However, for 'wordnet' and 'omw', this is what happens when I redownload:

    >>> nltk.download('omw')
    [nltk_data] Downloading package omw to
    [nltk_data]     C:\Users\Shane\AppData\Roaming\nltk_data...
    [nltk_data]   Unzipping corpora\omw.zip.
    True
    
    • alvas
      alvas over 8 years
      Which OS are you using?
  • pyman
    pyman over 8 years
    Thanks for your reply. I had already tried Solution 1, but it didn't fix the issue. I also tried Solution 2, but when I went to the GUI after, it still said 'out of date'. Furthermore, if I tried redownloading 'wordnet' and 'omw' through that method, it would redownload it as if it wasn't there before (as opposed to saying "xyz is already up-to-date"). I have just tried Solution 3. How can I tell if it worked properly? Going to the GUI, it still says 'out of date' for both 'wordnet' and 'omw'. If I apply solution 2 after solution 3, it redownloads it as if it wasn't there as well.
  • pyman
    pyman over 8 years
    Please see above for the solution 2 issue (sorry I can't properly type code here)
  • MartyMacGyver
    MartyMacGyver over 7 years
    The gui and command line checks are identical - when this happens, it's usually because an external package has changed since the nltk index (at github.com/nltk/nltk_data/blob/gh-pages/index.xml) was last generated... thus "the index is out of date" can mean you aren't current because you have old versions of packages OR because you have newer versions.
  • alvas
    alvas over 7 years
    The advice against using GUI is because the GUI is buggy. Actually, my packages never gets updated on the GUI because no matter how I have updated all packages with the latest index, it keeps showing me them as uninstalled packages. Thus, avoid the GUI. And also the GUI messes up whenever the kernel is killed, tkinter just hangs there.