How to use regular expressions in wget for rejecting files?
Solution 1
You can not specify a regular expression in the wget -R key, but you can specify a template (like file template in a shell).
The answer looks like:
$ wget -R 'newsbrief-*' ...
You can also use ? and symbol classes [].
For more information see info wget.
Solution 2
Since (apparently) v1.14 wget accepts regular expressions : --reject-regex and --accept-regex (with --regex-type posix by default, can be set to pcre if compiled with libpcre support).
Beware that it seems you can use --reject-regex only once per wget call. That is, you have to use | in a single regex if you want to select on several regex :
wget --reject-regex 'expr1|expr2|…' http://example.com
Related videos on Youtube
Hakim
Updated on July 09, 2022Comments
-
Hakim 6 monthsI am trying to download the contents of a website using wget tool. I used -R option to reject some file types. but there are some other files which I don't want to download. These files are named as follows, and don't have any extensions.
string-IDfor example:
newsbrief-02How I can tell wget not to download these files (the files which their names start with specified string)?
-
GregHNZ over 8 yearsThanks for the example with several regex. -
CMCDragonkai almost 6 yearsDoes reject-regex work with things like.or*, what kind of regex is it, extended regex or PCRE regex? -
Skippy le Grand Gourou almost 6 years@CMCDragonkai The second part of your question is already addressed in my answer. It does work with.and*, e.g.wget -r --reject-regex ".*foobar.*" http://example.comwill discardhttp://example.com/foobar/. -
Skippy le Grand Gourou almost 6 yearsNote that it seems regex only works on recursive downloads, though. -
CMCDragonkai almost 6 yearsI used it with--spider, and it I had to escape[], so that means[]is also special. I only asked about what kind of regex it is because GNU grep also supports different kinds of regex, like extended regex and pcre regex. -
Márton Tamás over 3 yearsIt may have been true at the date of the answer, but now You can use regular expressions to accept/reject patterns. See the other answer below. -
Michael Hall almost 3 yearsYou can also reject multiple patterns at once with commaswget -R 'newsbrief-*,string-*,*.txt'