how to ban crawler 360Spider with robots.txt or .htaccess?
Solution 1
Your robots.txt seems right. Some bots just ignore it (malicious bots crawl from any IP address from any botnet of hundreds to millions of infected devices from all around the globe), in this case you can limit the number of requests per second using mod_security module for apache 2.X
Config example here: http://blog.cherouvim.com/simple-dos-protection-with-mod_security/
[EDIT] On linux, iptables also allows restricting tcp:port connections per (x) second(s) per ip, providing conntrack capabilities are enabled on your kernel. See: https://serverfault.com/questions/378357/iptables-dos-limit-for-all-ports
Solution 2
In your .htaccess file simply add the following :
RewriteCond %{REMOTE_ADDR} ^(182\.118\.2) RewriteRule ^.*$ http://182.118.25.209/take_a_hike_moron [R=301,L]
This will catch ALL the bots being launched from the 182.118.2xx.xxx range and send them back to themself...
The crappy 360 bot is being fired from servers in China... so as long as you don't mind saying bye bye to crappy Chinese traffic from that IP range, this will guaranteed make those puppies disappear from reaching any files on your web site.
The following two lines in your .htaccess file will also pick it off simply by it being stupid enough to proudly put 360spider in its user agent string. This could be handy for when they use other IP ranges then the 182.118.2xx.xxx
RewriteCond %{HTTP_USER_AGENT} .*(360Spider) [NC] RewriteRule ^.*$ http://182.118.25.209/take_a_hike_moron [R=301,L]
And yes... I hate them too !
Solution 3
You can put following rules into your .htaccess
file
RewriteEngine On
RewriteBase /
SetEnvIfNoCase Referer 360Spider$ block_them
Deny from env=block_them
Note: Apache module mod_setenvif
should be enabled in your server configuration
Solution 4
The person running the crawler might be ignoring robots.txt. You could block them via IP
order deny, allow
deny from 216.86.192.196
in .htaccess
kovpack
Updated on June 09, 2022Comments
-
kovpack almost 2 years
I've got a problems because of 360Spider: this bot makes too many requests per second to my VPS and slows it down (the CPU-usage becomes 10-70%, but usually i have 1-2%). I looked into httpd logs and saw there such lines:
182.118.25.209 - - [06/Sep/2012:19:39:08 +0300] "GET /slovar/znachenie-slova/42957-polovity.html HTTP/1.1" 200 96809 "http://www.hrinchenko.com/slovar/znachenie-slova/42957-polovity.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11; 360Spider 182.118.25.208 - - [06/Sep/2012:19:39:08 +0300] "GET /slovar/znachenie-slova/52614-rospryskaty.html HTTP/1.1" 200 100239 "http://www.hrinchenko.com/slovar/znachenie-slova/52614-rospryskaty.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11; 360Spider
etc.
How can I block this spider completely via robots.txt? Now my robots.txt looks like this:
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ User-agent: YoudaoBot Disallow: / User-agent: sogou spider Disallow: /
I've added lines:
User-agent: 360Spider Disallow: /
but that does not seem to work. How to block this angry bot?
If you offer to block it via .htaccess, so mind that it looks now like this:
# Turn on URL rewriting RewriteEngine On # Installation directory RewriteBase / SetEnvIfNoCase Referer ^360Spider$ block_them Deny from env=block_them # Protect hidden files from being viewed <Files .*> Order Deny,Allow Deny From All </Files> # Protect application and system files from being viewed RewriteRule ^(?:application|modules|system)\b.* index.php/$0 [L] # Allow any files or directories that exist to be displayed directly RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d # Rewrite all other URLs to index.php/URL RewriteRule .* index.php/$0 [PT]
And, in spite of presence of
SetEnvIfNoCase Referer ^360Spider$ block_them Deny from env=block_them
this bot still tries to kill my VPS and is logged in access logs.