Why does apache log requests to GET http://www.google.com with code 200?

9,239

Solution 1

Maybe you want to read http://wiki.apache.org/httpd/ProxyAbuse

specially this point: "My server is properly configured not to proxy, so why is Apache returning a 200 (Success) status code?", it asks your question "Shouldn't that be a 404 response, not a 200!?"

If apache conf is ok, its just sending root page. It's the reason because you get a status code is 200.

Solution 2

I think this would happen if someone tried to use the server as a proxy. That would make the http://... URL "normal" (as opposed to just the path portion that you would expect from a regular server request.)

As for the 200 status code, that... err.. well, my server does that too. It seems to ignore the http://hostname portion and returns the result from the local server using the remaining path. You'll probably have to dig through the RFCs to figure out why that makes sense; I don't know the answer offhand.

Solution 3

Assuming you are not using your server as a proxy, these likely are common attempts of proxy abuse regularly seen on internet facing web servers.

The requests that received a status code of 200 probably returned your index page. You can check this using telnet or curl.

Suppose that:

  • your sever name is site.example.org;

  • third parties are trying to connect to news.example.net and search.example.com;

  • your /index.html file contains:

      <!DOCTYPE html>
      <html>
      <head><title>It works!</title></head>
      <body><h1>It works!</h1></body>
      </html>
    

Using curl, you can reconstruct the requests you received like so:

$ curl site.example.org --request-target http://news.example.net/
<!DOCTYPE html>
<html>
<head><title>It works!</title></head>
<body><h1>It works!</h1></body>
</html>

Using telnet, you can reconstruct the requests you received like so:

$ telnet site.example.org 80
> GET http://news.example.com/ HTTP/1.1
> Host: news.example.com
>
HTTP/1.1 200 OK
...
Content-Type: text/html
...

<!DOCTYPE html>
<html>
<head><title>It works!</title></head>
<body><h1>It works!</h1></body>
</html>

If you receive your index.html as a result, that means your server is not configured as a proxy and you should not worry about these requests.

If you actually receive the contents of news.example.com or news.example.net your web server is configured as a proxy. You can deactivate this by commenting any proxy on; lines on your Nginx configs or by disabling mod_proxy on your Apache configs.

Some interesting references about this:

Share:
9,239

Related videos on Youtube

Paul
Author by

Paul

PhD Social Science (Economics) California Institute of Technology 1995 Interested in design of smart marketplace and auctions, economic and financial decision support, related prototypes and simulation software. Available for short term projects. Contact me at [email protected] Blog: http://www.sunshineonacloudy.net Big Data, Economics &amp; Finance, Apache Spark, Google BigQuery, Other stuff... Open Source Projects: armdisarm.com Unofficial HAMP Loan Modification Calculator html5csv.js Javascript/JQuery lib for manipulating tabular data multicoresql Map/Reduce parallel execution framework for SQL queries. Hobbies: Amateur Radio Callsign KI6CQ

Updated on September 17, 2022

Comments

  • Paul
    Paul over 1 year

    I was recently asked 'What causes a line like this in our access.log?'

    59.56.109.181 - - [22/Feb/2010:16:03:35 -0800] "GET http://www.google.com/ HTTP/1.1" 200 295 "-" "Mozilla/5.0 (compatible; MSIE 5.01; Win2000)"

    My immediate answer is that's someone exploring something a little devious.

    But:

    • how? Speculation... a short perl or python script could easily connect and ask for a URL with an invalid host.
    • Vulnerabilities? What is someone looking for when they do this, what have they learned, and should we patch it?
    • Do I need a tin-foil hat to keep them from reading my mind?
    • And for me the real question: Shouldn't that be a 404 response, not a 200!?

    This is on a standard LAMP server (Ubuntu).

    • PP.
      PP. about 14 years
      First question: did you write your entire Apache configuration yourself? Is this a server you've inherited responsibility for?
    • Paul
      Paul about 14 years
      Co-workers server. Conf mostly comes from default packaging in Ubuntu. It will be a while if you want a conf file... I'm looking in my servers to see if it replicates...ah found some. Since its debian style the conf is split up -- what pieces do you need?
    • Paul
      Paul about 14 years
      One server this replicates on is a standard VPS at linode. The apache conf is set up for virtual hosting. sites.enabled contains, among other things, a 000-default with a "VirtualHost *:80" declaration. Could that be what lets in requests to google.com ?
    • Paul
      Paul about 14 years
      @PP no, I didn't write /etc/apache2/* myself. And I was hoping rich Uncle would leave me his Lamborghini, but all I got was this fracking server full of pr0n that keeps emailing whitehouse.gov . Actually a co-worker did ask me about seeing this on his server and I was looking for something better to say than "I've seen this too. The server won't explode, so who cares!?" After a few hacks get past, you start to care.
    • PP.
      PP. about 14 years
      I only ask because there may be some rewrite rules in the configuration somewhere. I would expect that, with a clean install, you wouldn't see the kind of HTTP request you gave with a 200 response. However Apache is extremely configurable and could well re-write such a URL into an acceptable form.
    • Paul
      Paul about 14 years
      @PP good point on rewrite, I don't tend to use those and so had forgotten about that. +1. I suspect the virtualhost * + human nature. Still don't know all the answers though.
    • user1686
      user1686 about 14 years
      If you want codegolf - printf 'GET http://www.google.com/ HTTP/1.1\r\n\r\n'|nc yourserver.com 80
    • Paul
      Paul about 14 years
      +1 nc looks cool as replacement for telnet