Redirect URLs with a # in .htaccess

8,073

From the mod_rewrite documentation you need to use the NE (no escape) flag when your rewrite rule has a hash:

RewriteRule #(.+)$ /? [L,R=301,NE]

You commented that the NE flag may only apply to the target URL and not the rewrite pattern. If that is the case, another approach would be to escape the # sign. mod_rewrite supports \x style escape sequences. The escape sequence for # would be \x23. So your rewrite rule could be:

RewriteRule \x23.+$ / [L,R=301]

If you want to test a solution, you can do so with telnet on the command line. Use the command line telnet example.com 80 to open a socket to your webserver. Then make a simple request like this:

GET /#test HTTP/1.0
Host: example.com

Followed by an extra new line.

Share:
8,073

Related videos on Youtube

fuxia
Author by

fuxia

Freelancer. Chess addict. Skeptic. Manic reader. Once I was a robot.

Updated on September 18, 2022

Comments

  • fuxia
    fuxia almost 2 years

    On one of my sites I get 404s because some scripts are fetching all URLs from the start page as raw URLs, so they include the # in the URL. Normal browsers don’t ever send that part to the server, but these scripts do.

    This is how a typical requests URL looks like:

    /2014/how-to-manage-wordpress-multisite-imports-with-wp-cli/#comments
    

    The # is not URL encoded.

    I tried both following methods:

    RedirectMatch 301 \#comments       /
    

    and

    RewriteRule #(.+)$ /? [L,R=301]
    

    Both without success, the rules don’t catch these requests, because the # starts a comment. The referer and the user-agent fields are empty.

    What should I do?

    • MrWhite
      MrWhite over 9 years
      Since this is a "script" (not a browser) which is sending these malformed URLs, is a 404 not the correct response?
    • Stephen Ostermiller
      Stephen Ostermiller over 9 years
      I was thinking that "400 Bad Request" would be the appropriate response.
    • DisgruntledGoat
      DisgruntledGoat over 9 years
      "Normal browsers don’t ever send that part to the server, but these scripts do." - Are you absolutely sure? Just because you tell JS to request that URL doesn't mean the browser will do it. Have you checked the Network tab in Firefox/Chrome Dev Tools?
  • fuxia
    fuxia over 9 years
    I have read that as if [NE] is for the target URL only, not for the rewrite pattern. Will try it.
  • MrWhite
    MrWhite over 9 years
    ...and, as the link suggests, flags apply to mod_rewrite, not mod_alias. However, I suspect the "fragment identifier" part is stripped from the URL that is matched by the RewriteRule pattern (just like the query string). You could try RewriteCond and test against %{QUERY_STRING} or %{THE_REQUEST}?
  • Stephen Ostermiller
    Stephen Ostermiller over 9 years
    You are right, w3d. I put the flag on the wrong rule. I've edited my answer now.
  • fuxia
    fuxia over 9 years
    \x23 doesn’t seem to work too. I will contact the hoster and report back when I get some results.
  • MrWhite
    MrWhite over 9 years
    @toscho How are you testing this? Or are you just waiting for one of these "invalid" requests coming through? You say the "# is not URL encoded" (presumably looking at your logs), however, an unencoded # in the URL is strictly invalid, so even if you are able to match the # it would not be correct to return a valid response (but also, who/what are you returning a response to?). You say your server is responding with a 404... what URL is your script seeing in your (custom) 404; if your custom 404 is being called at all?
  • fuxia
    fuxia over 9 years
    @w3d Unfortunately, I haven’t found a way to make such requests, so I have to wait. The URL is that in my question. The 404 is served by WordPress. I want to avoid that and catch these requests before WordPress is loaded. A simple 400 with just one line of output should be enough.
  • Stephen Ostermiller
    Stephen Ostermiller over 9 years
    I've added info on how you can test it.