Why does apache log requests to GET http://www.google.com with code 200?
Solution 1
Maybe you want to read http://wiki.apache.org/httpd/ProxyAbuse
specially this point: "My server is properly configured not to proxy, so why is Apache returning a 200 (Success) status code?", it asks your question "Shouldn't that be a 404 response, not a 200!?"
If apache conf is ok, its just sending root page. It's the reason because you get a status code is 200.
Solution 2
I think this would happen if someone tried to use the server as a proxy. That would make the http://... URL "normal" (as opposed to just the path portion that you would expect from a regular server request.)
As for the 200 status code, that... err.. well, my server does that too. It seems to ignore the http://hostname portion and returns the result from the local server using the remaining path. You'll probably have to dig through the RFCs to figure out why that makes sense; I don't know the answer offhand.
Solution 3
Assuming you are not using your server as a proxy, these likely are common attempts of proxy abuse regularly seen on internet facing web servers.
The requests that received a status code of 200 probably returned your index page.
You can check this using telnet
or curl
.
Suppose that:
-
your sever name is
site.example.org
; -
third parties are trying to connect to
news.example.net
andsearch.example.com
; -
your
/index.html
file contains:<!DOCTYPE html> <html> <head><title>It works!</title></head> <body><h1>It works!</h1></body> </html>
Using curl, you can reconstruct the requests you received like so:
$ curl site.example.org --request-target http://news.example.net/
<!DOCTYPE html>
<html>
<head><title>It works!</title></head>
<body><h1>It works!</h1></body>
</html>
Using telnet, you can reconstruct the requests you received like so:
$ telnet site.example.org 80
> GET http://news.example.com/ HTTP/1.1
> Host: news.example.com
>
HTTP/1.1 200 OK
...
Content-Type: text/html
...
<!DOCTYPE html>
<html>
<head><title>It works!</title></head>
<body><h1>It works!</h1></body>
</html>
If you receive your index.html
as a result, that means your server
is not configured as a proxy and you should not worry about these requests.
If you actually receive the contents of news.example.com
or news.example.net
your web server is configured as a proxy.
You can deactivate this by commenting any proxy on;
lines on your Nginx configs
or by disabling mod_proxy
on your Apache configs.
Some interesting references about this:
- Proxy Abuse on Apache's wiki
- Unusal GET requests in my nodejs journal - has my nginx/node been hacked?
- Server getting probed: why can I see successful GET requests to other sites?
Related videos on Youtube
Paul
PhD Social Science (Economics) California Institute of Technology 1995 Interested in design of smart marketplace and auctions, economic and financial decision support, related prototypes and simulation software. Available for short term projects. Contact me at [email protected] Blog: http://www.sunshineonacloudy.net Big Data, Economics & Finance, Apache Spark, Google BigQuery, Other stuff... Open Source Projects: armdisarm.com Unofficial HAMP Loan Modification Calculator html5csv.js Javascript/JQuery lib for manipulating tabular data multicoresql Map/Reduce parallel execution framework for SQL queries. Hobbies: Amateur Radio Callsign KI6CQ
Updated on September 17, 2022Comments
-
Paul over 1 year
I was recently asked 'What causes a line like this in our access.log?'
59.56.109.181 - - [22/Feb/2010:16:03:35 -0800] "GET http://www.google.com/ HTTP/1.1" 200 295 "-" "Mozilla/5.0 (compatible; MSIE 5.01; Win2000)"
My immediate answer is that's someone exploring something a little devious.
But:
- how? Speculation... a short perl or python script could easily connect and ask for a URL with an invalid host.
- Vulnerabilities? What is someone looking for when they do this, what have they learned, and should we patch it?
- Do I need a tin-foil hat to keep them from reading my mind?
- And for me the real question: Shouldn't that be a 404 response, not a 200!?
This is on a standard LAMP server (Ubuntu).
-
PP. about 14 yearsFirst question: did you write your entire Apache configuration yourself? Is this a server you've inherited responsibility for?
-
Paul about 14 yearsCo-workers server. Conf mostly comes from default packaging in Ubuntu. It will be a while if you want a conf file... I'm looking in my servers to see if it replicates...ah found some. Since its debian style the conf is split up -- what pieces do you need?
-
Paul about 14 yearsOne server this replicates on is a standard VPS at linode. The apache conf is set up for virtual hosting. sites.enabled contains, among other things, a 000-default with a "VirtualHost *:80" declaration. Could that be what lets in requests to google.com ?
-
Paul about 14 years@PP no, I didn't write /etc/apache2/* myself. And I was hoping rich Uncle would leave me his Lamborghini, but all I got was this fracking server full of pr0n that keeps emailing whitehouse.gov . Actually a co-worker did ask me about seeing this on his server and I was looking for something better to say than "I've seen this too. The server won't explode, so who cares!?" After a few hacks get past, you start to care.
-
PP. about 14 yearsI only ask because there may be some rewrite rules in the configuration somewhere. I would expect that, with a clean install, you wouldn't see the kind of HTTP request you gave with a 200 response. However Apache is extremely configurable and could well re-write such a URL into an acceptable form.
-
Paul about 14 years@PP good point on rewrite, I don't tend to use those and so had forgotten about that. +1. I suspect the virtualhost * + human nature. Still don't know all the answers though.
-
user1686 about 14 yearsIf you want codegolf -
printf 'GET http://www.google.com/ HTTP/1.1\r\n\r\n'|nc yourserver.com 80
-
Paul about 14 years+1 nc looks cool as replacement for telnet