Using wildcards in wget or curl query

67,940

Solution 1

You can't use wildcards in wget but the -A flag should work. From the wget manpage:

You want to download all the gifs from a directory on an http server. You tried wget http://www.server.com/dir/*.gif, but that didn't work because http retrieval does not support globbing. In that case, use: wget -r -l1 --no-parent -A.gif http://www.server.com/dir/

Edit: found a related question

Regarding directories:

There's a utility called LFTP, which has some support for globbing. Take a look at the manpage. There's another question on Linux & Unix that covers its usage in a scenario similar to yours.

Solution 2

If you are able to found a pattern in your query, you can use the bash brace expansion to do this task.

For example, in your case, you may use something like:

wget www.download.example.com/dir/{version,old}/package{00..99}.rpm

Also, you may combine this with the -A and -R parameters to filter your results.

Solution 3

Although the above solution kind of works, it fails when you just want to download certain directories, but not all. For example if you have:

http://site.io/like/
http://site.io/like2/
http://site.io/nolike/

Instead put the directory names you want in a text file, e.g.: dirs.txt:

like/
like2/

Then use wget with the following command options -i dirs.txt -B <base-URL> like so:

wget -nH -nc -np -r -e robots=off -R "index.html*" -i dirs.txt -B http://site.io/

Since, I don't think you can use directories in the -A and -R lists. (?)

Share:
67,940

Related videos on Youtube

mart1n
Author by

mart1n

Updated on July 09, 2022

Comments

  • mart1n
    mart1n 3 months

    Is it possible to use wildcards in wget queries when downloading from directories? Basically, I have a site, say, www.download.example.com/dir/version/package.rpm. Howevery, the version directory changes all the time and sometimes contains multiple RPM packages. Is there a single wget query that would get me only and all the packages in the version directory without knowing what that version is? In my ideal world, the query would be something like:

    wget www.download.example.com/dir/*/*.rpm
    

    If there is a way to do this with curl, that would work as well.

  • mart1n
    mart1n about 9 years
    Ok, this allows me to specify that I only want the .rpm files. But, the issue of the ambiguous directory name remains. How do I tell wget that the directory name is irrelevant and to just jump to the dir that contains the RPM files.
  • Admin
    Admin about 9 years
    @mart1n added info on lftp to answer, hopefully it helps