How to use regular expressions in wget for rejecting files?
Solution 1
You can not specify a regular expression in the wget -R
key, but you can specify a template (like file template in a shell).
The answer looks like:
$ wget -R 'newsbrief-*' ...
You can also use ?
and symbol classes []
.
For more information see info wget.
Solution 2
Since (apparently) v1.14 wget
accepts regular expressions : --reject-regex
and --accept-regex
(with --regex-type posix
by default, can be set to pcre
if compiled with libpcre
support).
Beware that it seems you can use --reject-regex
only once per wget
call. That is, you have to use |
in a single regex if you want to select on several regex :
wget --reject-regex 'expr1|expr2|…' http://example.com
Related videos on Youtube

Hakim
Updated on July 09, 2022Comments
-
Hakim 6 months
I am trying to download the contents of a website using wget tool. I used -R option to reject some file types. but there are some other files which I don't want to download. These files are named as follows, and don't have any extensions.
string-ID
for example:
newsbrief-02
How I can tell wget not to download these files (the files which their names start with specified string)?
-
GregHNZ over 8 yearsThanks for the example with several regex.
-
CMCDragonkai almost 6 yearsDoes reject-regex work with things like
.
or*
, what kind of regex is it, extended regex or PCRE regex? -
Skippy le Grand Gourou almost 6 years@CMCDragonkai The second part of your question is already addressed in my answer. It does work with
.
and*
, e.g.wget -r --reject-regex ".*foobar.*" http://example.com
will discardhttp://example.com/foobar/
. -
Skippy le Grand Gourou almost 6 yearsNote that it seems regex only works on recursive downloads, though.
-
CMCDragonkai almost 6 yearsI used it with
--spider
, and it I had to escape[]
, so that means[]
is also special. I only asked about what kind of regex it is because GNU grep also supports different kinds of regex, like extended regex and pcre regex. -
Márton Tamás over 3 yearsIt may have been true at the date of the answer, but now You can use regular expressions to accept/reject patterns. See the other answer below.
-
Michael Hall almost 3 yearsYou can also reject multiple patterns at once with commas
wget -R 'newsbrief-*,string-*,*.txt'