How to block this URL pattern in Varnish VCL?

5,962

Solution 1

Have you tried looking for url paths where that begin with /http ?

if (req.url ~ "^/https?:") {
  error 404 "Not found" 
}

Solution 2

Have you considered implementing caching of your 404s? In order to block these with regexes, you'd need to keep your VCL and your backend in sync to avoid blocking valid URLs. If you just cache 404s for a significant amount of time, you will still need to load from the backend once per URL but subsequent requests will get a cached 404 page. You can do a manual purge if you ever add real content to a URL with a cached 404 response.

Share:
5,962

Related videos on Youtube

iTech
Author by

iTech

Updated on September 18, 2022

Comments

  • iTech
    iTech almost 2 years

    My website is getting badly hit by spambots and scrappers. I've used Cloudflare but the problem still remains there. The problem is spambots accessing non-existing urls causing a lot of load to my drupal backend which goes all the way and bootstraps db just to serve a 404 error doc.

    I cant simply dish out non-drupal 404's for all page not found errors, as I need to have drupal catch them. Since, varnish is in front it can check if the bot is acting nice and asking for valid url - if not it servers them a 404 or 403. These bots are causing errors using this pattern :

    http://www.megaleecher.net/http:/www.megaleecher.net/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_S/Using_iPhone_As_USB_Mass_Storage
    

    Now, pls. suggest a regex varnbisg VCL directive which catches this URL pattern and serves a 404 error from varnish, preventing it from reaching apache/drupal ?

  • iTech
    iTech over 11 years
    Pax, I am not sure how valid url's will get blocked. What I was thinking was to implement a varnish filter which says : "If the request URL has two http:/ (instead of normal single occurrence) serve a 404 or 503 as this is a bad bot".
  • iTech
    iTech over 11 years
    My website seem to have made into some sort of list for those automated comment spammers, and hits come from various IP's. I guess they are using dynamic IP's.