Disabling URL decoding in nginx proxy

43,941

Solution 1

Quoting Valentin V. Bartenev (who should get the full credit for this answer):

A quote from documentation:

  • If proxy_pass is specified with URI, when passing a request to the server, part of a normalized request URI matching the location is replaced by a URI specified in the directive

  • If proxy_pass is specified without URI, a request URI is passed to the server in the same form as sent by a client when processing an original request

The correct configuration in your case would be:

location /foo {
   proxy_pass http://localhost:8080;
}

Solution 2

Note that URL decoding, commonly known as $uri "normalisation" within the documentation of nginx, happens before the backend IFF:

  • either any URI is specified within proxy_pass itself, even if just the trailing slash all by itself,

  • or, URI is changed during the processing, e.g., through rewrite.


Both conditions are explicitly documented at http://nginx.org/r/proxy_pass (emphasis mine):

  • If the proxy_pass directive is specified with a URI, then when a request is passed to the server, the part of a normalized request URI matching the location is replaced by a URI specified in the directive

  • If proxy_pass is specified without a URI, the request URI is passed to the server in the same form as sent by a client when the original request is processed, or the full normalized request URI is passed when processing the changed URI


The solution is to either omit the URI as in OPs case, or, indeed, use a clever rewrite rule:

# map `/foo` to `/foo`:
location /foo {
    proxy_pass  http://localhost:8080;  # no URI -- not even just a slash
}

# map `/foo` to `/bar`:
location /foo {
    rewrite  ^  $request_uri;            # get original URI
    rewrite  ^/foo(/.*)  /bar$1  break;  # drop /foo, put /bar
    return 400;   # if the second rewrite won't match
    proxy_pass    http://localhost:8080$uri;
}

You can see it live in a related Stack Overflow answer, including control group.

Share:
43,941

Related videos on Youtube

Tomasz Nurkiewicz
Author by

Tomasz Nurkiewicz

Updated on September 18, 2022

Comments

  • Tomasz Nurkiewicz
    Tomasz Nurkiewicz over 1 year

    When I browse to this URL: http://localhost:8080/foo/%5B-%5D server (nc -l 8080) receives it as-is:

    GET /foo/%5B-%5D HTTP/1.1
    

    However when I proxy this application via nginx (1.1.19):

    location /foo {
            proxy_pass    http://localhost:8080/foo;
    }
    

    The same request routed through nginx port is forwarded with path decoded:

    GET /foo/[-] HTTP/1.1
    

    Decoded square brackets in the GET path are causing the errors in the target server (HTTP Status 400 - Illegal character in path...) as they arrive un-escaped.

    Is there a way to disable URL decoding or encode it back so that the target server gets the exact same path when routed through nginx? Some clever URL rewrite rule?

  • herrtim
    herrtim over 10 years
    I had to change http://localhost:8080/ to http://localhost:8080 in case anyone has the same situation as I did.
  • platypus
    platypus over 10 years
    Why does Nginx decode the URI before passing it to the backend server? Wouldn't it make more sense if it kept the URI untouched?
  • Congmin
    Congmin about 6 years
    @platypus, it is kept untouched, until you explicitly start performing the substitutions
  • Michael Hampton
    Michael Hampton about 6 years
    The documentation is confusing here. Both forms contain a URI. It is the path component that is present in one and missing in the other.
  • Congmin
    Congmin about 6 years
    @MichaelHampton, I disagree — the PATH is generally called the URI, so, the one without the path, doesn't contain the URI.
  • Michael Hampton
    Michael Hampton about 6 years
    A relative path alone can also be a valid URL, of course. The point is, the remainder is also a valid URI (e.g. http://localhost:8080). If you disagree, you can take it up with the authors of RFC 3986.
  • Norman Xu
    Norman Xu over 5 years
    @MichaelHampton Unforturnately, it seems scheme and path are mandatory to be an URI, authority, arguments, fragment are optional
  • Marc
    Marc almost 4 years
    Is it just me or is the standard behaviour wacky? We don't want URLs changed just because we happen to rewrite to a path instead of to the root!!
  • Congmin
    Congmin almost 4 years
    @Marc just you. The standard behaviour is to preemptively address many security pitfalls, and ensure you can't blame your security issues on nginx. P.S. Did you notice the return 400 in this answer? I bet most folks don't bother to understand what it's for, or deem it unnecessary, even though it's pretty essential for security.
  • Marc
    Marc almost 4 years
    If I pass /foo%20bar to NGINX and it passes literally /yo/foo bar (an invalid URL containing a space) downstream which then fails then the behaviour is wrong/buggy. See trac.nginx.org/nginx/ticket/1930
  • Congmin
    Congmin almost 4 years
    @Marc no, you're incorrect, and your comment is very misleading — nginx will never pass a space upstream if you use the correct configuration as has been pointed out in that trac issue you link to; you have an incorrect use of regular expression captures that's causing your problem; your configuration sample is not the best practice even if it'd have worked as you may expect; I agree 100% with the nginx devs in that trac issue that the defect report is invalid.
  • Marc
    Marc almost 4 years
    OK, I will look at the suggestions there. I still think there should be a way to get escaped URL components like foo%20bar - NGINX seems to think we only need unescaped values foo bar.
  • Congmin
    Congmin almost 4 years
    @Marc again, your statement is incorrect; the devs have pointed out where your problem was and what the correct solution and best practice should be; you never explained why the proposed solution wouldn't work for your usecase; so, frankly, I don't even understand what you're trying to argue here anymore, because your proposed solution (that would let you use configuration that you've been already told is suboptimal in the first place) would break other usecases.
  • Marc
    Marc almost 4 years
    That's an arrogant statement. I have taken their feedback onboard. Perhaps you can explain why you think it makes sense to decode URL elements into $1? I certainly have illustrated cases where it is a problem and BREAKS HTTP and don't see anyone offering examples where this is a good idea. Why whould NGINX decode URIs into variables?? How can we re-encode them??
  • Congmin
    Congmin almost 4 years
    Marc, your configuration is just wrong. It has been explained in Trac, as well as here, several times. The proper solution has been explained as well; again, you never once indicated why the proper solution that has been suggested wouldn't work for you. If you don't want to follow the proper solution, that doesn't mean that it's nginx that's broken. Please stop posting misleading statements about nginx. What do you think happens when nginx receives a request for GET /../../../../../../etc/passwd? Which non-regex location would catch it? What about GET /this%20is%20a%20test.txt?