Using wget to download select directories from ftp server

18,393

Solution 1

Based on this doc it seems that the filtering functions of wget are very limited.

When using the --recursive option, wget will download all linked documents after applying the various filters, such as --no-parent and -I, -X, -A, -R options.

In your example:

wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/

This won't download anything, because the -I option specifies to include only links matching /pub/special.requests/cew/2013/county/, but on the page /pub/special.requests/cew/ there are no such links, so the download stops there. This will work though:

wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/2013/

... because in this case the /pub/special.requests/cew/2013/ page does have a link to county/

Btw, you can find more details in this doc than on the man page:

http://www.gnu.org/software/wget/manual/html_node/

Solution 2

can't you simply do (and add the --timestamping/--no-parent etc. as needed)

 wget -r ftp://ftp.bls.gov/pub/special.requests/cew/2013/county

The -I seems to work at one directory level at a time, so if we step one step up from county/ we could do:

 wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/2013/

But apparently we can't step further up and do

 wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/
Share:
18,393
Al R.
Author by

Al R.

Open data, dataviz, and statistics with R, SQL, SAS, Python, and JavaScript.

Updated on June 24, 2022

Comments

  • Al R.
    Al R. almost 2 years

    I'm trying to understand how to use wget to download specific directories from a bunch of different ftp sites with economic data from the US government.

    As a simple example, I know that I can download an entire directory using a command like:

    wget  --timestamping  --recursive --no-parent ftp://ftp.bls.gov/pub/special.requests/cew/2013/county/
    

    But I envision running more complex downloads, where I might want to limit a download to a handful of directories. So I've been looking at the --include option. But I don't really understand how it works. Specifically, why doesn't this work:

    wget --timestamping --recursive -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/
    

    The following does work, in the sense that it downloads files, but it downloads way more than I need (everything in the 2013 directory, vs just the county subdirectory):

    wget --timestamping --recursive -I /pub/special.requests/cew/2013/ ftp://ftp.bls.gov/pub/special.requests/cew/
    

    I can't tell if i'm not understanding something about wget or if my issue is with something more fundamental to ftp server structures.

    Thanks for the help!

  • janos
    janos over 10 years
    According to the docs, the parameters of -I must be absolute paths...