How to use regular expressions in wget for rejecting files?

27,411

Solution 1

You can not specify a regular expression in the wget -R key, but you can specify a template (like file template in a shell).

The answer looks like:

$ wget -R 'newsbrief-*' ...

You can also use ? and symbol classes [].

For more information see info wget.

Solution 2

Since (apparently) v1.14 wget accepts regular expressions : --reject-regex and --accept-regex (with --regex-type posix by default, can be set to pcre if compiled with libpcre support).

Beware that it seems you can use --reject-regex only once per wget call. That is, you have to use | in a single regex if you want to select on several regex :

wget --reject-regex 'expr1|expr2|…' http://example.com
Share:
27,411

Related videos on Youtube

Hakim
Author by

Hakim

Updated on July 09, 2022

Comments

  • Hakim
    Hakim 6 months

    I am trying to download the contents of a website using wget tool. I used -R option to reject some file types. but there are some other files which I don't want to download. These files are named as follows, and don't have any extensions.

    string-ID
    

    for example:

    newsbrief-02
    

    How I can tell wget not to download these files (the files which their names start with specified string)?

  • GregHNZ
    GregHNZ over 8 years
    Thanks for the example with several regex.
  • CMCDragonkai
    CMCDragonkai almost 6 years
    Does reject-regex work with things like . or *, what kind of regex is it, extended regex or PCRE regex?
  • Skippy le Grand Gourou
    Skippy le Grand Gourou almost 6 years
    @CMCDragonkai The second part of your question is already addressed in my answer. It does work with . and *, e.g. wget -r --reject-regex ".*foobar.*" http://example.com will discard http://example.com/foobar/.
  • Skippy le Grand Gourou
    Skippy le Grand Gourou almost 6 years
    Note that it seems regex only works on recursive downloads, though.
  • CMCDragonkai
    CMCDragonkai almost 6 years
    I used it with --spider, and it I had to escape [], so that means [] is also special. I only asked about what kind of regex it is because GNU grep also supports different kinds of regex, like extended regex and pcre regex.
  • Márton Tamás
    Márton Tamás over 3 years
    It may have been true at the date of the answer, but now You can use regular expressions to accept/reject patterns. See the other answer below.
  • Michael Hall
    Michael Hall almost 3 years
    You can also reject multiple patterns at once with commas wget -R 'newsbrief-*,string-*,*.txt'