wget mirroring the site fails - 403 Forbidden, even with user-agent

11,990

Try

wget -mk -w 20 --user-agent="Mozilla/4.5 (X11; U; Linux x86_64; en-US)" https://rules.emergingthreats.net

The site may not know newer browser spec and you had the site url within quotes

Share:
11,990

Related videos on Youtube

superuser
Author by

superuser

Updated on September 18, 2022

Comments

  • superuser
    superuser over 1 year

    I need to download suricata's latest rules from this url:

    http://rules.emergingthreats.net/blockrules/

    I'm trying to compose my own rules based on different emerging rules and snort's rules too. Trying to replace alert with drop.

    I can't even download all rules, because apache 2.2.22 of rules.emergingthreats.net configured somehow against wget.

    My command looks like:

    # wget -m --user-agent="Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.63 Safari/534.3"  "https://rules.emergingthreats.net/"
    

    I'm trying to mirror whole the site.

    --2014-07-22 06:18:49--  https://rules.emergingthreats.net/
    Resolving rules.emergingthreats.net (rules.emergingthreats.net)... 204.12.217.19, 96.43.137.99
    Connecting to rules.emergingthreats.net (rules.emergingthreats.net)|204.12.217.19|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: unspecified [text/html]
    Saving to: ‘rules.emergingthreats.net/index.html’
    
        [ <=>                                                                                                                                                                    ] 1,951       --.-K/s   in 0s      
    
    Last-modified header missing -- time-stamps turned off.
    2014-07-22 06:18:50 (23.3 MB/s) - ‘rules.emergingthreats.net/index.html’ saved [1951]
    

    Thereafter, all inner directories give me 403 Forbidden:

    --2014-07-22 06:18:51--  https://rules.emergingthreats.net/blockrules/
    Reusing existing connection to rules.emergingthreats.net:443.
    HTTP request sent, awaiting response... 403 Forbidden
    2014-07-22 06:18:51 ERROR 403: Forbidden.
    

    I'm successfully browse the site(directory), I can download any file with my browser - chromium ( Ubuntu ).

    Why does wget fail to fetch the data?

  • superuser
    superuser almost 10 years
    Works. Ho do you know?
  • superuser
    superuser almost 10 years
    I've put -w 30, tomorrow it must be ready. Thank you.