Using wildcards in wget or curl query
Solution 1
You can't use wildcards in wget but the -A flag should work. From the wget manpage:
You want to download all the gifs from a directory on an http server. You tried
wget http://www.server.com/dir/*.gif, but that didn't work because http retrieval does not support globbing. In that case, use:wget -r -l1 --no-parent -A.gif http://www.server.com/dir/
Edit: found a related question
Regarding directories:
There's a utility called LFTP, which has some support for globbing. Take a look at the manpage. There's another question on Linux & Unix that covers its usage in a scenario similar to yours.
Solution 2
If you are able to found a pattern in your query, you can use the bash brace expansion to do this task.
For example, in your case, you may use something like:
wget www.download.example.com/dir/{version,old}/package{00..99}.rpm
Also, you may combine this with the -A and -R parameters to filter your results.
Solution 3
Although the above solution kind of works, it fails when you just want to download certain directories, but not all. For example if you have:
http://site.io/like/
http://site.io/like2/
http://site.io/nolike/
Instead put the directory names you want in a text file, e.g.: dirs.txt:
like/
like2/
Then use wget with the following command options -i dirs.txt -B <base-URL> like so:
wget -nH -nc -np -r -e robots=off -R "index.html*" -i dirs.txt -B http://site.io/
Since, I don't think you can use directories in the -A and -R lists. (?)
Related videos on Youtube
mart1n
Updated on July 09, 2022Comments
-
mart1n 3 monthsIs it possible to use wildcards in wget queries when downloading from directories? Basically, I have a site, say,
www.download.example.com/dir/version/package.rpm. Howevery, the version directory changes all the time and sometimes contains multiple RPM packages. Is there a single wget query that would get me only and all the packages in theversiondirectory without knowing what that version is? In my ideal world, the query would be something like:wget www.download.example.com/dir/*/*.rpmIf there is a way to do this with
curl, that would work as well. -
mart1n about 9 yearsOk, this allows me to specify that I only want the.rpmfiles. But, the issue of the ambiguous directory name remains. How do I tellwgetthat the directory name is irrelevant and to just jump to the dir that contains the RPM files. -
Admin about 9 years@mart1n added info on lftp to answer, hopefully it helps