Redirect URLs with a # in .htaccess
From the mod_rewrite documentation you need to use the NE
(no escape) flag when your rewrite rule has a hash:
RewriteRule #(.+)$ /? [L,R=301,NE]
You commented that the NE
flag may only apply to the target URL and not the rewrite pattern. If that is the case, another approach would be to escape the #
sign. mod_rewrite supports \x
style escape sequences. The escape sequence for #
would be \x23
. So your rewrite rule could be:
RewriteRule \x23.+$ / [L,R=301]
If you want to test a solution, you can do so with telnet
on the command line. Use the command line telnet example.com 80
to open a socket to your webserver. Then make a simple request like this:
GET /#test HTTP/1.0
Host: example.com
Followed by an extra new line.
Related videos on Youtube
![fuxia](https://i.stack.imgur.com/TBdkD.png?s=256&g=1)
fuxia
Freelancer. Chess addict. Skeptic. Manic reader. Once I was a robot.
Updated on September 18, 2022Comments
-
fuxia almost 2 years
On one of my sites I get 404s because some scripts are fetching all URLs from the start page as raw URLs, so they include the
#
in the URL. Normal browsers don’t ever send that part to the server, but these scripts do.This is how a typical requests URL looks like:
/2014/how-to-manage-wordpress-multisite-imports-with-wp-cli/#comments
The
#
is not URL encoded.I tried both following methods:
RedirectMatch 301 \#comments /
and
RewriteRule #(.+)$ /? [L,R=301]
Both without success, the rules don’t catch these requests, because the
#
starts a comment. The referer and the user-agent fields are empty.What should I do?
-
MrWhite over 9 yearsSince this is a "script" (not a browser) which is sending these malformed URLs, is a 404 not the correct response?
-
Stephen Ostermiller over 9 yearsI was thinking that "400 Bad Request" would be the appropriate response.
-
DisgruntledGoat over 9 years"Normal browsers don’t ever send that part to the server, but these scripts do." - Are you absolutely sure? Just because you tell JS to request that URL doesn't mean the browser will do it. Have you checked the Network tab in Firefox/Chrome Dev Tools?
-
-
fuxia over 9 yearsI have read that as if
[NE]
is for the target URL only, not for the rewrite pattern. Will try it. -
MrWhite over 9 years...and, as the link suggests, flags apply to mod_rewrite, not mod_alias. However, I suspect the "fragment identifier" part is stripped from the URL that is matched by the
RewriteRule
pattern (just like the query string). You could tryRewriteCond
and test against%{QUERY_STRING}
or%{THE_REQUEST}
? -
Stephen Ostermiller over 9 yearsYou are right, w3d. I put the flag on the wrong rule. I've edited my answer now.
-
fuxia over 9 years
\x23
doesn’t seem to work too. I will contact the hoster and report back when I get some results. -
MrWhite over 9 years@toscho How are you testing this? Or are you just waiting for one of these "invalid" requests coming through? You say the "
#
is not URL encoded" (presumably looking at your logs), however, an unencoded#
in the URL is strictly invalid, so even if you are able to match the#
it would not be correct to return a valid response (but also, who/what are you returning a response to?). You say your server is responding with a 404... what URL is your script seeing in your (custom) 404; if your custom 404 is being called at all? -
fuxia over 9 years@w3d Unfortunately, I haven’t found a way to make such requests, so I have to wait. The URL is that in my question. The 404 is served by WordPress. I want to avoid that and catch these requests before WordPress is loaded. A simple 400 with just one line of output should be enough.
-
Stephen Ostermiller over 9 yearsI've added info on how you can test it.