NLTK data out of date - Python 3.4
Solution 1
In short:
Don't use the GUI, add all packages within the python interpreter.
$ python3
>>> import nltk
>>> nltk.download('all')
In long:
It might be because of the recent addition of Open Multilingual WordNet
and something is not working right with the NLTK download GUI interface and the indices.
Solution 1:
Simply use the nltk.download()
GUI and download the two packages without selecting all. (May not work but worth the try)
Solution 2:
Install the package individually through the python interpreter:
>>> import nltk
>>> nltk.download('wordnet')
>>> nltk.download('omw') # Open Multilingual WordNet
Solution 3:
Let the nltk.download('all')
check through all packages in its index and download them if they're not available.
>>> import nltk
>>> nltk.downlad('all')
Note: If any files was corrupted possibly due to broken internet connection, simply find the directory where NLTK data is stored and then proceed with solution 3.
To find where nltk_data
is stored, nltk.data.path
stores the possible locations:
>>> import nltk
>>> nltk.data.path
['/home/alvas/nltk_data', '/usr/share/nltk_data', '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']
Since the point of the data download is to use them, to know that you're not missing the components you need, and if that's wordnet
and omw
, you can try this:
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('bank')[0]
Synset('bank.n.01')
>>> wn.synsets('bank')[0].lemma_names('spa')
['margen', 'orilla', 'vera']
>>> wn.synsets('bank')[0].lemma_names('fre')
['rive', 'banque']
Don't worry so much as in what is shown on the GUI. Once nltk.download('all')
is completed without errors, it means you have all the corpora and models that NLTK supports.
But as a good practice, please raise an issue in https://github.com/nltk/nltk_data/issues so that the developers can check if the problem can be replicated. Show some more printscreen of the error. before and after the proposed solutions too =)
Solution 2
Don't worry about the "out of date" messages, it's a waste of your time. Just go ahead and use the nltk.
The NLTK's data resources are almost entirely independent of each other. You might never have reason to use either of the packages that are marked as "out of date", but even if you do, chances are they are in fact fully installed and usable.
Still, it's happened to me too and this is what I found: It seems that the downloader will consider a resource to be "out of date" if it detects files in its download folder that are not in the resource manifest. Perhaps this is sometimes caused by misconfigured resources, but if you've visited the resources in question with a directory browser, you may have caused the mismatch through stray files left behind by your GUI, or your editor, or who knows what. E.g., on a Mac the Finder will leave a .DS_Store
file in directories it visits.
But as I said, the "problem" is not really worth fixing. Enjoy the NLTK!
PS. As far as I know, the best (and really only) way to refresh your nltk_data
directory is to delete the whole thing and download again.
pyman
Updated on June 17, 2022Comments
-
pyman almost 2 years
I'm trying to install NLTK for Python 3.4. The actual NLTK module appears to have installed fine. I then ran
import nltk nltk.download()
and chose to download everything. However, after it was done, the window simply says 'out of date'. I tried refreshing and downloading, yet it stays 'out of date' as shown here:NLTK Window 1
I looked online and tried various fixes, but I haven't found any that helped my case yet.
I also tried to manually find the missing parts, which turned out to be 'Open Multilingual Wordnet' and 'Wordnet'. Here's how I found which parts were missing: Open Multilingual Wordnet.
What should I do? Should I uninstall and reinstall NLTK? I haven't really found a way to delete the packages (except for manually deleting it).
EDIT: Regarding Solution 2 and Solution 3: For more clarification on the Solution 2 issue:
If something has sucessfully downloaded, this is the output:
>>> nltk.download('subjectivity') [nltk_data] Downloading package subjectivity to [nltk_data] C:\Users\Shane\AppData\Roaming\nltk_data... [nltk_data] Package subjectivity is already up-to-date! True
However, for 'wordnet' and 'omw', this is what happens when I redownload:
>>> nltk.download('omw') [nltk_data] Downloading package omw to [nltk_data] C:\Users\Shane\AppData\Roaming\nltk_data... [nltk_data] Unzipping corpora\omw.zip. True
-
alvas over 8 yearsWhich OS are you using?
-
-
pyman over 8 yearsThanks for your reply. I had already tried Solution 1, but it didn't fix the issue. I also tried Solution 2, but when I went to the GUI after, it still said 'out of date'. Furthermore, if I tried redownloading 'wordnet' and 'omw' through that method, it would redownload it as if it wasn't there before (as opposed to saying "xyz is already up-to-date"). I have just tried Solution 3. How can I tell if it worked properly? Going to the GUI, it still says 'out of date' for both 'wordnet' and 'omw'. If I apply solution 2 after solution 3, it redownloads it as if it wasn't there as well.
-
pyman over 8 yearsPlease see above for the solution 2 issue (sorry I can't properly type code here)
-
MartyMacGyver over 7 yearsThe gui and command line checks are identical - when this happens, it's usually because an external package has changed since the nltk index (at github.com/nltk/nltk_data/blob/gh-pages/index.xml) was last generated... thus "the index is out of date" can mean you aren't current because you have old versions of packages OR because you have newer versions.
-
alvas over 7 yearsThe advice against using GUI is because the GUI is buggy. Actually, my packages never gets updated on the GUI because no matter how I have updated all packages with the latest index, it keeps showing me them as uninstalled packages. Thus, avoid the GUI. And also the GUI messes up whenever the kernel is killed, tkinter just hangs there.