How to get Java to match JavaScript encodeURIComponent() method?

17,620

Solution 1

According to Mozilla Developer Docs encodeURICompoent() uses UTF-8 to encode. When I run this on your string I get tester%C3%A6%C3%B8%C3%A5 as expected. When i run the following Java code:

System.out.println(URLEncoder.encode("testeræøå", "UTF-8"));

It also prints tester%C3%A6%C3%B8%C3%A5. I also ran your test and got:

    ------ START TESTING WITH USER ID = 'dummy' ----------------------
Test URLEncoder.encode(userId): dummy
Test URLEncoder.encode(userId,"UTF-8"): dummy
Test URLEncoder.encode(userId,"UTF-16"): dummy
Test URLEncoder.encode(userId,"UTF-16LE"): dummy
Test URLEncoder.encode(userId,"UTF-16BE"): dummy
Test engine.eval("encodeURIComponent(\""+userId+"\")"): dummy
Test encodeURIComponent(userId): dummy
TEST new URI(userId).toASCIIString(): dummy
------ END TESTING WITH USER ID = 'dummy' ----------------------


------ START TESTING WITH USER ID = 'testeræøå' ----------------------
Test URLEncoder.encode(userId): tester%C3%A6%C3%B8%C3%A5
Test URLEncoder.encode(userId,"UTF-8"): tester%C3%A6%C3%B8%C3%A5
Test URLEncoder.encode(userId,"UTF-16"): tester%FE%FF%00%E6%00%F8%00%E5
Test URLEncoder.encode(userId,"UTF-16LE"): tester%E6%00%F8%00%E5%00
Test URLEncoder.encode(userId,"UTF-16BE"): tester%00%E6%00%F8%00%E5
Test engine.eval("encodeURIComponent(\""+userId+"\")"): tester%C3%A6%C3%B8%C3%A5
Test encodeURIComponent(userId): tester%C3%A6%C3%B8%C3%A5
TEST new URI(userId).toASCIIString(): tester%C3%A6%C3%B8%C3%A5
------ END TESTING WITH USER ID = 'testeræøå' ----------------------


------ START TESTING WITH USER ID = 'tester%C3%A6%C3%B8%C3%A5' ----------------------
Test URLEncoder.encode(userId): tester%25C3%25A6%25C3%25B8%25C3%25A5
Test URLEncoder.encode(userId,"UTF-8"): tester%25C3%25A6%25C3%25B8%25C3%25A5
Test URLEncoder.encode(userId,"UTF-16"): tester%FE%FF%00%25C3%FE%FF%00%25A6%FE%FF%00%25C3%FE%FF%00%25B8%FE%FF%00%25C3%FE%FF%00%25A5
Test URLEncoder.encode(userId,"UTF-16LE"): tester%25%00C3%25%00A6%25%00C3%25%00B8%25%00C3%25%00A5
Test URLEncoder.encode(userId,"UTF-16BE"): tester%00%25C3%00%25A6%00%25C3%00%25B8%00%25C3%00%25A5
Test engine.eval("encodeURIComponent(\""+userId+"\")"): tester%25C3%25A6%25C3%25B8%25C3%25A5
Test encodeURIComponent(userId): tester%25C3%25A6%25C3%25B8%25C3%25A5
TEST new URI(userId).toASCIIString(): tester%C3%A6%C3%B8%C3%A5
------ END TESTING WITH USER ID = 'tester%C3%A6%C3%B8%C3%A5' ----------------------

This is what I would expect.

I think you need to check the file encoding for your Java source file. If you are using Eclipse it defaults to cp1252 for some reason. The first thing I do when I install Eclipse is change the default encoding to UTF-8.

Solution 2

For others stumbling upon this query & noticing that (space) translates to + in java but %20 in javascript.

One possible solution is to use org.apache.commons.httpclient.util.URIUtil#encodeQuery

If you're using the latest httpclient 4, then URIParserUtil#escapeChars can be used instead.

Sample Code : URIUtil.encodeQuery(strQuery); //httpclient 3.x URIParserUtil.escapeChars(strQuery); //httpclient 4.x

Share:
17,620

Related videos on Youtube

mattssmith
Author by

mattssmith

Updated on September 15, 2022

Comments

  • mattssmith
    mattssmith 4 months

    I am trying to pass this strings in the URL which contain special characters and the only way I can get it to work is with JavaScript encodeURIComponent('testerๆ๘ๅ') which produces "tester%C3%A6%C3%B8%C3%A5"

    Everything I try to do in Java produces different encodings, and do not work on the other end... Any idea how I can get testerๆ๘ๅ encoded to tester%C3%A6%C3%B8%C3%A5 in Java? Thanks in advance!

    package com.mastercard.cp.sdng.domain.user;
    
    import org.apache.commons.lang.StringUtils;
    
    import javax.script.ScriptEngine;
    import javax.script.ScriptEngineManager;
    import javax.script.ScriptException;
    import java.io.UnsupportedEncodingException;
    import java.net.URI;
    import java.net.URISyntaxException;
    import java.net.URLEncoder;
    
    public class UrlEncodingSample
    {
        public static void main(String[] args)
        {
            String userId = "dummy";
            try
            {
                validateEncoding(userId);
    
                userId = "testeræøå";
    
                validateEncoding(userId);
    
                userId = URLEncoder.encode(userId);
    
                validateEncoding(userId);
            }
            catch (UnsupportedEncodingException e)
            {
                e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
            }
    
        }
    
        private static void validateEncoding(String userId) throws UnsupportedEncodingException
        {
            System.out.println("------ START TESTING WITH USER ID = '"+userId+"' ----------------------");
            System.out.println("Test URLEncoder.encode(userId): " + URLEncoder.encode(userId));
            System.out.println("Test URLEncoder.encode(userId,\"UTF-8\"): " + URLEncoder.encode(userId, "UTF-8"));
            System.out.println("Test URLEncoder.encode(userId,\"UTF-16\"): " + URLEncoder.encode(userId,"UTF-16"));
            System.out.println("Test URLEncoder.encode(userId,\"UTF-16LE\"): " + URLEncoder.encode(userId,"UTF-16LE"));
            System.out.println("Test URLEncoder.encode(userId,\"UTF-16BE\"): " + URLEncoder.encode(userId,"UTF-16BE"));
    
            ScriptEngine engine = new ScriptEngineManager().getEngineByName("JavaScript");
            try
            {
                System.out.println("Test engine.eval(\"encodeURIComponent(\\\"\"+userId+\"\\\")\"): " +
                        engine.eval("encodeURIComponent(\""+userId+"\")"));
            }
            catch (ScriptException e)
            {
                e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
            }
            System.out.println("Test encodeURIComponent(userId): " + encodeURIComponent(userId));
            try
            {
                System.out.println("TEST new URI(userId).toASCIIString(): " + new URI(userId).toASCIIString());
            }
            catch (URISyntaxException e)
            {
                e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
            }
            System.out.println("------ END TESTING WITH USER ID = '"+userId+"' ----------------------\n\n");
    
        }
    
    
    
        public static String encodeURIComponent(String input) {
            if(StringUtils.isEmpty(input)) {
                return input;
            }
    
            int l = input.length();
            StringBuilder o = new StringBuilder(l * 3);
            try {
                for (int i = 0; i < l; i++) {
                    String e = input.substring(i, i + 1);
                    if (ALLOWED_CHARS.indexOf(e) == -1) {
                        byte[] b = e.getBytes("utf-8");
                        o.append(getHex(b));
                        continue;
                    }
                    o.append(e);
                }
                return o.toString();
            } catch(UnsupportedEncodingException e) {
                e.printStackTrace();
            }
            return input;
        }
    
        private static String getHex(byte buf[]) {
            StringBuilder o = new StringBuilder(buf.length * 3);
            for (int i = 0; i < buf.length; i++) {
                int n = (int) buf[i] & 0xff;
                o.append("%");
                if (n < 0x10) {
                    o.append("0");
                }
                o.append(Long.toString(n, 16).toUpperCase());
            }
            return o.toString();
        }
    
        public static final String ALLOWED_CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!~*'()";
    }
    

    Output of above class is this:

        ------ START TESTING WITH USER ID = 'dummy' ----------------------
        Test URLEncoder.encode(userId): dummy
        Test URLEncoder.encode(userId,"UTF-8"): dummy
        Test URLEncoder.encode(userId,"UTF-16"): dummy
        Test URLEncoder.encode(userId,"UTF-16LE"): dummy
        Test URLEncoder.encode(userId,"UTF-16BE"): dummy
        Test engine.eval("encodeURIComponent(\""+userId+"\")"): dummy
        Test encodeURIComponent(userId): dummy
        TEST new URI(userId).toASCIIString(): dummy
        ------ END TESTING WITH USER ID = 'dummy' ----------------------
    
    
        ------ START TESTING WITH USER ID = 'testerๆ๘ๅ' ----------------------
        Test URLEncoder.encode(userId): tester%E6%F8%E5
        Test URLEncoder.encode(userId,"UTF-8"): tester%E0%B9%86%E0%B9%98%E0%B9%85
        Test URLEncoder.encode(userId,"UTF-16"): tester%FE%FF%0E%46%0E%58%0E%45
        Test URLEncoder.encode(userId,"UTF-16LE"): tester%46%0E%58%0E%45%0E
        Test URLEncoder.encode(userId,"UTF-16BE"): tester%0E%46%0E%58%0E%45
        Test engine.eval("encodeURIComponent(\""+userId+"\")"): tester%e0%b9%86%e0%b9%98%e0%b9%85
        Test encodeURIComponent(userId): tester%E0%B9%86%E0%B9%98%E0%B9%85
        TEST new URI(userId).toASCIIString(): tester%E0%B9%86%E0%B9%98%E0%B9%85
        ------ END TESTING WITH USER ID = 'testerๆ๘ๅ' ----------------------
    
    
        ------ START TESTING WITH USER ID = 'tester%E6%F8%E5' ----------------------
        Test URLEncoder.encode(userId): tester%25E6%25F8%25E5
        Test URLEncoder.encode(userId,"UTF-8"): tester%25E6%25F8%25E5
        Test URLEncoder.encode(userId,"UTF-16"): tester%FE%FF%00%25E6%FE%FF%00%25F8%FE%FF%00%25E5
        Test URLEncoder.encode(userId,"UTF-16LE"): tester%25%00E6%25%00F8%25%00E5
        Test URLEncoder.encode(userId,"UTF-16BE"): tester%00%25E6%00%25F8%00%25E5
        Test engine.eval("encodeURIComponent(\""+userId+"\")"): tester%25E6%25F8%25E5
        Test encodeURIComponent(userId): tester%25E6%25F8%25E5
        TEST new URI(userId).toASCIIString(): tester%E6%F8%E5
        ------ END TESTING WITH USER ID = 'tester%E6%F8%E5' ----------------------
    
    

    Note: As I was writing this up, it occurred to me that I could use the URLEncoder.encode(userId, "UTF-8") as long as I used the proper decoder on the other side... but I was still trying to find a way to encode it to match the JavaScript encodeURIComponent function which apparently works without the need to decode it on the other side. :)

  • markbernard
    markbernard over 8 years
    If you like the answer can you please select it as the correct one? Thanks
  • Daniil Iaitskov
    Daniil Iaitskov about 5 years
    It's not true. Js method encodes space with %20 but java method uses +
  • markbernard
    markbernard about 5 years
    @DaneelS.Yaitskov No one has said anything about how either encodes spaces.