how to rewrite '%25' in url

14,163

Here's your original rule, with [L] added to denote "last":

RewriteRule ^(.*)%25(.*)$ $1%$2 [L,R=301]

After that there are a few problems here. First, percent signs in RewriteRule patterns have a special meaning; they denote the beginning of a back reference to a RewriteCond. You can get around this by escaping them (using a backslash):

RewriteRule ^(.*)\%25(.*)$ $1%$2 [L,R=301]

Second, when you insert a % in to the replacement, it doesn't then go on to treat that as part of a uri-encoded piece. It translates to a literal percent sign. In the original url you are receiving, the first %25"is converted into a literal percent sign as well. So the above rule will result in literal %25s or a literal %2b in the url instead of resolving to % or +. So you have to manually resolve these yourself.

RewriteRule ^(.*)\%25(.*)$ $1%$2
RewriteRule ^(.*)\%2b(.*)$ $1+$2 [L,R=301]

Finally, since you don't just have a single 25 after the initial %, but potentially many, use [N] to denote "next". This basically means "start the process over from the beginning, but use my new url as the input". So this will deal with any number of 25s after the percent:

RewriteRule ^(.*)\%25(.*)$ $1%$2 [N]
RewriteRule ^(.*)\%2b(.*)$ $1+$2 [L,R=301]

Note: This should work if you are setting up your rule in the regular apache configs. If you are setting it up as an .htaccess, leading slashes are omitted from the string checked against the regex, in which case you have to add them back in yourself:

RewriteRule ^(.*)\%25(.*)$ /$1%$2 [N]
RewriteRule ^(.*)\%2b(.*)$ /$1+$2 [L,R=301]

UPDATE: I don't have the ability to test right now, but looking at the docs, I just saw an option NE for "no escape" that makes percents work as regular encoding markers in the result. If I understand correctly, that means the rule can be simplified to this:

RewriteRule ^(.*)\%25(.*)$ $1%$2 [NE,N,L,R=301]

But again, this is untested, and I've never actually used the NE flag so I may be misunderstanding it. If you test this and find that it works, let me know and I'll remove this UPDATE and just fix the above answer to include this simpler version.

Share:
14,163

Related videos on Youtube

nn4l
Author by

nn4l

Updated on September 18, 2022

Comments

  • nn4l
    nn4l almost 2 years

    My website software replaces space characters with '+' characters in the URL, A proper link would look like 'http://www.schirmacher.de/display/INFO/How+to+reattach+a+disk+to+XenServer' for example.

    Some websites link to that article but somehow their embedded editor can't handle the encoding, so what I see in the httpd log files is actually

    GET /display/INFO/How%2525252bto%2525252breattach%2525252ba%2525252bdisk%2525252bto%2525252bXenServer
    

    which of course leads to a 404 error. It seems that the '+' character is encoded as '%2b' and then the '%' character is encoded as '%25' - several times.

    Since there are many such references to different pages from different websites, I would like to rewrite the url so that the visitors get the correct page.

    Here's my attempt which does not work:

    RewriteRule ^(.*)%25(.*)$ $1%$2 [R=301]
    

    What it is supposed to do is: take everything before the %25 string and everything after it, concat those strings with a '%' in between, then redirect.

    With the example input URL the rule should rewrite to

    /display/INFO/How%25252bto%2525252breattach%2525252ba%2525252bdisk%2525252bto%2525252bXenServer
    

    followed by a redirect, then it should rewrite to

    /display/INFO/How%252bto%2525252breattach%2525252ba%2525252bdisk%2525252bto%2525252bXenServer
    

    and again to

    /display/INFO/How%2bto%2525252breattach%2525252ba%2525252bdisk%2525252bto%2525252bXenServer
    

    and so on. Finally, after a lot of redirects I should have left

    /display/INFO/How%2bto%2breattach%2ba%2bdisk%2bto%2bXenServer
    

    which is a valid url equivalent to /display/INFO/How+to+reattach+a+disk+to+XenServer.

    My problem is that the expression does not match at all, so it does not even replace a single occurrence of %25.

    I understand that there is a limit in the number of redirects and I should really use the [N] flag however I don't even get the first step right.


    @Ben Lee: thanks for your detailed answer. I have now spent several hours on that problem. Here's what I have found out:

    1. Any '%25' string in the url is converted to '%' before mod_rewrite sees it. So the RewriteRule ^(.)%25(.)$ does not match '%25' in the url, it actually matches '%2525'.

    2. The presence of a backslash does not make a difference. It seems that the '%' sign is not interpreted as a backreference in my case, perhaps because there is no RewriteCond statement before. But it is probably better to use it, just to be sure.

    3. The line having [L,R=301] is incorrect. It will attempt to redirect for every %2b match but there is a limit of allowed redirects and it will fail if there are more.

    Here are the mod_rewrite lines I am using:

    RewriteRule ^(.*)\%25(.*\%25.*)$ $1%$2 [N]
    RewriteRule ^(.*)\%25(.*)$ $1%$2 [R=301,L]
    
    RewriteRule ^(.*)\%2b(.*\%2b.*)$ $1+$2 [N]
    RewriteRule ^(.*)\%2b(.*)$ $1+$2 [R=301,L]
    

    The third line will replace all but one %2b sequences with a '+' character. When there is only one %2b sequence left, the fourth line will match, forcing a redirect.

    The first and second line are basically the same but with a %25 sequence. It is necessary to have a rule with an [R] flag for each possible character sequence because I am also using mod_proxy / mod_jk and the redirect will make sure that the resulting url is fed to each module again. Otherwise httpd would attempt to fetch the url from disk which would fail in my case.

    • cyberx86
      cyberx86 about 12 years
      Your current rule will only replace a single occurrence of %25. You need to 'loop' until you have gotten rid of all instances (use the N (next) flag instead of R=301) and only on the last match do the redirect. The problem is trying to avoid the redirect loop detection (>10 by default). If know that %2525252b always needs to become +, you could try: RewriteRule ^(.*)%2525252b(.*%2525252b.*)$ $1+$2 [N] and RewriteRule ^(.*)%2525252b(.*)$ $1+$2 [R=301] (use both together). There is an online rule tester that might help as well.
    • David Schwartz
      David Schwartz about 12 years
      And don't ever use a literal % character in any of your URLs.