Automatic download a site with *wget*: how to prevent?

5,455

Solution 1

You can't easily stop someone who's patient from using wget to get the content of your website, unless you have a single person in mind, and can block their IP address.

wget has plenty of features built in to ensure it can download anything from the web (such as random delays in requests, cookie handling, etc.)

You could try rate limiting using iptables or some of the bandwidth modules for Apache, but you don't tell us which OS or which Web Server you're running so it's hard to recommend a specific solution.

Just to be clear, wget allows you to change the user agent string,

-U AGENT-STRING --user-agent=AGENT-STRING

so it can pretend to be any other browser, you can tell it to send any headers you like (including referrers) and any other number of options.

The closest you can come is by using a lot of javascript which wget can't naturally parse. But then you run the risk of alienating users who don't use javascript on their browsers.

Solution 2

If you want to disable wget to grab your site pages then simply add the code below in the .htaccess file under public_html directory for which you wish to disable the wget grabbing.

   SetEnvIfNoCase User-Agent "^Wget" bad_bot

    <Limit GET POST>
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </Limit>

of course wget has some very nice options to spoof the useragent. So this is just going to stall...

Solution 3

You can try it with this rewrite rule

RewriteCond %{HTTP_USER_AGENT} Wget.*
RewriteRule .* - [F,L]

This is just a simple rule that matches Wget plus anything, you can build your own more complex rule and adopt it to your requirements.

Greetings Izac

Share:
5,455

Related videos on Youtube

sergzach
Author by

sergzach

Updated on September 18, 2022

Comments

  • sergzach
    sergzach over 1 year

    How to prevent automatic download with wget for a website? Could you give links to articles please?

    Thanks.

    • MDMarra
      MDMarra over 12 years
      You don't tell us what version of Apache, what OS, or what your larger problem is. As written, there's no way to definitively answer your question.
    • Zoredache
      Zoredache over 12 years
      If you block my wget, I'll just switch over to curl. If you block that I'll use another tool.