Change nltk.download() path directory from default ~/ntlk_data
Solution 1
This can be configured both by command-line (nltk.download(..., download_dir=)
or by GUI. Bizarrely nltk seems to totally ignore its own environment variable NLTK_DATA
and default its download directories to a standard set of five paths, regardless whether NLTK_DATA
is defined and where it points, and regardless whether nltk's five default dirs even exist on the machine or architecture(!). Some of that is documented in Installing NLTK Data, although it's incomplete and kinda buried; reproduced below with much clearer formatting:
Command line installation
The downloader will search for an existing
nltk_data
directory to install NLTK data. If one does not exist it will attempt to create one in a central location (when using an administrator account) or otherwise in the user’s filespace. If necessary, run the download command from an administrator account, or using sudo. The recommended system location is:
C:\nltk_data
(Windows) ;/usr/local/share/nltk_data
(Mac) and/usr/share/nltk_data
(Unix).You can use the -d flag to specify a different location (but if you do this, be sure to set the NLTK_DATA environment variable accordingly).
Run the command
python -m nltk.downloader all
To ensure central installation, run the command:
sudo python -m nltk.downloader -d /usr/local/share/nltk_data all
But really they should say:
sudo python -m nltk.downloader -d $NLTK_DATA all
Now as to what recommended path NLTK_DATA should use, nltk doesn't really give any proper guidance, but it should be a generic standalone path not under any install tree (so not under <python-install-directory>/lib/site-packages
) or any user dir. Hence, /usr/local/share
, /opt/share
or similar. On MacOS 10.7+, /usr
and thus /usr/local/
these days are hidden by default, so /opt/share
may well be a better choice. Or do chflags nohidden /usr/local/share
.
Solution 2
According to the documentation:
By default, packages are installed in either a system-wide directory (if Python has sufficient access to write to it); or in the current user’s home directory. However, the download_dir argument may be used to specify a different installation target, if desired.
To specify the download directory, use for example:
nltk.download('treebank', download_dir='/mnt/data/treebank')
Solution 3
You may also use nltk.download_shell()
and follow the interactive steps as shown below.
Also use nltk.data.path.append('/your/new/data/directory/path')
to instruct nltk to to load data from new data path.
Related videos on Youtube
shenglih
Updated on January 25, 2020Comments
-
shenglih over 4 years
I was trying to download/update python
nltk
packages on a computing server and it returned this[Errno 122] Disk quota exceeded:
error.Specifically:
[nltk_data] Downloading package stop words to /home/sh2264/nltk_data... [nltk_data] Error downloading u'stopwords' from [nltk_data] <https://raw.githubusercontent.com/nltk/nltk_data/gh- [nltk_data] pages/packages/corpora/stopwords.zip>: [Errno 122] [nltk_data] Disk quota exceeded: [nltk_data] u'/home/sh2264/nltk_data/corpora/stopwords.zip False
How could I change the entire path for
nltk
packages, and what other changes should I make to ensure errorless loading ofnltk
? -
user239558 about 6 yearsThis is not the behavior I see.. as root in a docker container it downloads into /root/nltk_data.
-
smci about 6 years@user239558: which OS and nltk version?
-
Sashini Hettiarachchi over 5 yearsIf we download only stopwords for the specific directory in Linux
sudo python -m nltk.downloader -d /usr/local/share/nltk_data stopwords
-
nbeuchat about 5 years@HansikaHettiarachchi you can specify more than one download.
sudo python -m nltk.downloader -d /usr/local/share/nltk_data stopwords wordnet punkt
-
smci about 5 years@user239558 et al, if you found a docbug, please report it to nltk
-
Makan over 3 yearsTo retrieve some nltk's downloaded packages you may need to also include the
download_dir
to nltk's data path ->nltk.data.path.append('/mnt/data/treebank')
-
Marijn over 2 yearsJust to avoid confusion: you need to specify the path including the
nltk_data
part, so if you want to install in/usr/share
then the command (within the Python shell) isnltk.download('...',download_dir='/usr/share/nltk_data')
.