Why does Java automatically decode %2F in URI encoded filenames?

10,763

Solution 1

The new File(URI) constructs the file based on the path as obtained by URI#getPath() instead of -what you expected- URI#getRawPath(). This look like a feature "by design".

You have 2 options:

  1. Run URLEncoder#encode() on fn twice (note: encode(), not encoder()).
  2. Use new File(String) instead.

Solution 2

I think that @BalusC has nailed the direct problem in your code. I'd just like to point out some other issuse

The dir.toURI().toASCIIString() and URLEncoder.encoder(fn, "UTF-8").toString() expressions actually do rather different things.

  • The first one, encodes the URI as a string, applying the URI encoding rules according to the URI grammar. So for example, a '/' in the path component will not be encoded but a '/' in the query or fragment components will be encoded as %2F.

  • The second one, encodes the fn String applying the encoding rules without reference to the content of the string.

The File(URI) constructor's mapping from a file URI to a File is system dependent and undocumented. I'm a bit surprised that it decodes the %2F, but it does what it does, and @BalusC explains why. The take-away is that it is potentially problematic to use a mechanism ("file:" URIs) that are explicitly system dependent.

Finally, it is wrong to combine those URI component strings like that. It should be either

URI uri = new URI(
        dir.toURI().toString() +
        URLEncoder.encoder(fn, "UTF-8").toString();

or

URI uri = new URI(
        dir.toURI().toASCIIString() +
        URLEncoder.encoder(fn, "ASCII").toString());
Share:
10,763
Lucas
Author by

Lucas

Updated on June 04, 2022

Comments

  • Lucas
    Lucas almost 2 years

    I have a servlet that needs to write out files that have a user-configurable name. I am trying to use URI encoding to properly escape special characters, but the JRE appears to automatically convert encoded forward slashes %2F into path separators.

    Example:

    File   dir = new File("C:\Documents and Setting\username\temp");
    String fn  = "Top 1/2.pdf";
    URI    uri = new URI( dir.toURI().toASCIIString() + URLEncoder.encoder( fn, "ASCII" ).toString() );
    File   out = new File( uri );
    
    System.out.println( dir.toURI().toASCIIString() );
    System.out.println( URLEncoder.encode( fn, "ASCII" ).toString() );
    System.out.println( uri.toASCIIString() );
    System.out.println( output.toURI().toASCIIString() );
    

    The output is:

    file:/C:/Documents%20and%20Settings/username/temp/
    Top+1%2F2.pdf   
    file:/C:/Documents%20and%20Settings/username/temp/Top+1%2F2.pdf
    file:/C:/Documents%20and%20Settings/username/temp/Top+1/2.pdf
    

    After the new File object is instantiated, the %2F sequence is automatically converted to a forward slash and I end up with an incorrect path. Does anybody know the proper way to approach this issue?

    The core of the problem seems to be that

    uri.equals( new File(uri).toURI() ) == FALSE
    

    when there is a %2F in the URI.

    I'm planning to just use the URLEncoded string verbatim rather than trying to use the File(uri) constructor.