Is Java 8 java.util.Base64 a drop-in replacement for sun.misc.BASE64?

38,214

Solution 1

Here's a small test program that illustrates a difference in the encoded strings:

byte[] bytes = new byte[57];
String enc1 = new sun.misc.BASE64Encoder().encode(bytes);
String enc2 = new String(java.util.Base64.getMimeEncoder().encode(bytes),
                         StandardCharsets.UTF_8);

System.out.println("enc1 = <" + enc1 + ">");
System.out.println("enc2 = <" + enc2 + ">");
System.out.println(enc1.equals(enc2));

Its output is:

enc1 = <AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>
enc2 = <AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA>
false

Note that the encoded output of sun.misc.BASE64Encoder has a newline at the end. It doesn't always append a newline, but it happens to do so if the encoded string has exactly 76 characters on its last line. (The author of java.util.Base64 considered this to be a small bug in the sun.misc.BASE64Encoder implementation – see the review thread).

This might seem like a triviality, but if you had a program that relied on this specific behavior, switching encoders might result in malformed output. Therefore, I conclude that java.util.Base64 is not a drop-in replacement for sun.misc.BASE64Encoder.

Of course, the intent of java.util.Base64 is that it's a functionally equivalent, RFC-conformant, high-performance, fully supported and specified replacement that's intended to support migration of code away from sun.misc.BASE64Encoder. You need to be aware of some edge cases like this when migrating, though.

Solution 2

I had same issue, when i moved from sun to java.util.base64, but then org.apache.commons.codec.binary.Base64 solved my problem

Solution 3

There are no changes to the base64 specification between rfc1521 and rfc2045.

All base64 implementations could be considered to be drop-in replacements of one another, the only differences between base64 implementations are:

  1. the alphabet used.
  2. the API's provided (e.g. some might take only act on a full input buffer, while others might be finite state machines allowing you to continue to push chunks of input through them until you are done).

The MIME base64 alphabet has remained constant between RFC versions (it has to or older software would break) and is: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+/

As Wikipedia notes, only the last 2 characters may change between base64 implementations.

As an example of a base64 implementation that does change the last 2 characters, the IMAP MUTF-7 specification uses the following base64 alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+,

The reason for the change is that the / character is often used as a path delimiter and since the MUTF-7 encoding is used to flatten non-ASCII directory paths into ASCII, the / character needed to be avoided in encoded segments.

Solution 4

Assuming both encoders are bug free, then the RFC requires distinct encodings for every 0 byte, 1 byte, 2 byte and 3 bytes sequence. Longer sequences are broken down into as many 3 byte sequences as needed followed by a final sequence. Hence if the two implementations handle all 16,843,009 (1+256+65536+16777216) possible sequences correctly, then the two implementations are also identical.

These tests only take a few minutes to run. By slightly changing your test code, I have done that and my Java 8 installation passed all the test. Hence the public implementation can be used to safely replace the sun.misc implementation.

Here is my test code:

import java.util.Base64;
import java.util.Arrays;
import java.io.IOException;

public class Base64EncodingDecodingRoundTripTest {

    public static void main(String[] args) throws IOException {
        System.out.println("Testing zero byte encoding");
        encodeDecode(new byte[0]);

        System.out.println("Testing single byte encodings");
        byte[] test = new byte[1];
        for(int i=0;i<256;i++) {
            test[0] = (byte) i;
            encodeDecode(test);
        }
        System.out.println("Testing double byte encodings");
        test = new byte[2];
        for(int i=0;i<65536;i++) {
            test[0] = (byte) i;
            test[1] = (byte) (i >>> 8);
            encodeDecode(test);
        }
        System.out.println("Testing triple byte encodings");
        test = new byte[3];
        for(int i=0;i<16777216;i++) {
            test[0] = (byte) i;
            test[1] = (byte) (i >>> 8);
            test[2] = (byte) (i >>> 16);
            encodeDecode(test);
        }
        System.out.println("All tests passed");
    }

    static void encodeDecode(final byte[] testInput) throws IOException {
        sun.misc.BASE64Encoder unsupportedEncoder = new sun.misc.BASE64Encoder();
        sun.misc.BASE64Decoder unsupportedDecoder = new sun.misc.BASE64Decoder();

        Base64.Encoder mimeEncoder = java.util.Base64.getMimeEncoder();
        Base64.Decoder mimeDecoder = java.util.Base64.getMimeDecoder();

        String sunEncoded = unsupportedEncoder.encode(testInput);
        String mimeEncoded = mimeEncoder.encodeToString(testInput);

        // check encodings equal
        if( ! sunEncoded.equals(mimeEncoded) ) {
            throw new IOException("Input "+Arrays.toString(testInput)+" produced different encodings (sun=\""+sunEncoded+"\", mime=\""+mimeEncoded+"\")");
        }

        // Check cross decodes are equal. Note encoded forms are identical
        byte[] mimeDecoded = mimeDecoder.decode(sunEncoded);
        byte[] sunDecoded = unsupportedDecoder.decodeBuffer(mimeEncoded); // throws IOException
        if(! Arrays.equals(mimeDecoded,sunDecoded) ) {
            throw new IOException("Input "+Arrays.toString(testInput)+" was encoded as \""+sunEncoded+"\", but decoded as sun="+Arrays.toString(sunDecoded)+" and mime="+Arrays.toString(mimeDecoded));
        }

    }
}
Share:
38,214

Related videos on Youtube

Ivo Mori
Author by

Ivo Mori

How I got here? My curiosity for computers sparked with a C64 in 1985. I remember my first "e-learning" experience running an educational program on it. It taught me about the origins of programming languages (FORTRAN, ALGOL, and COBOL) and the exciting things you could do with this wonderful 8-bit machine. Pretty soon after I found myself playing around with the C64’s BASIC. I created all sorts of "little" programs while getting myself into trouble using those GOTO statements (what a headache). Quickly I learned that programming makes it extremely easy to dig yourself into a hole – turning a program into an unreadable, non-working and confusing mess in no time – how ironic that I'd learn about Edsger Dijkstra's letter "Go To Statement Considered Harmful" only much, much later. Since then I got hooked on computers and use them for fun and profit. I'm a computer enthusiast, interested in the why and how stuff works or doesn't... Thoughts on Stack Overflow When it comes down to programming (and you got yourself into trouble) I don't know anyone who didn't rely on Stack Overflow at least once. It's surprising how straightforward it works – seems simple on the surface but there's much depth to it. Nothing is perfect (of course) as Stack Overflow's content depends on its many different contributors and the community's capacity, patience and willingness to curate it in good faith. The critics I'll only ask to remember how it was before we had Stack Overflow – were those coding forums really any good? Stack Overflow isn't about lengthy forum-style discussions, but rather a question-and-answer site with the least possible distraction. – If we didn't already have it someone would need to invent it. What fascinates me about Stack Overflow is that it's a huge information system that manages itself through seemingly simple rules while constantly improving its content. Its approach to organise knowledge is universal and it finds application for many other topics – not just for programming. That's pretty amazing.

Updated on July 09, 2022

Comments

  • Ivo Mori
    Ivo Mori almost 2 years

    Question

    Are the Java 8 java.util.Base64 MIME Encoder and Decoder a drop-in replacement for the unsupported, internal Java API sun.misc.BASE64Encoder and sun.misc.BASE64Decoder?

    EDIT (Clarification): By drop-in replacement I mean that I can switch legacy code using sun.misc.BASE64Encoder and sun.misc.BASE64Decoder to Java 8 MIME Base64 Encoder/Decoder for any existing other client code transparently.

    What I think so far and why

    Based on my investigation and quick tests (see code below) it should be a drop-in replacement because

    • sun.misc.BASE64Encoder based on its JavaDoc is a BASE64 Character encoder as specified in RFC1521. This RFC is part of the MIME specification...
    • java.util.Base64 based on its JavaDoc Uses the "The Base64 Alphabet" as specified in Table 1 of RFC 2045 for encoding and decoding operation... under MIME

    Assuming no significant changes in the RFC 1521 and 2045 (I could not find any) and based on my quick test using the Java 8 Base64 MIME Encoder/Decoder should be fine.

    What I am looking for

    • an authoritative source confirming or disproving the "drop-in replacement" point OR
    • a counterexample which shows a case where java.util.Base64 has different behaviour than the sun.misc.BASE64Encoder OpenJDK Java 8 implementation (8u40-b25) (BASE64Decoder) OR
    • whatever you think answers above question definitely

    For reference

    My test code

    public class Base64EncodingDecodingRoundTripTest {
    
        public static void main(String[] args) throws IOException {
            String test1 = " ~!@#$%^& *()_+=`| }{[]\\;: \"?><,./ ";
            String test2 = test1 + test1;
    
            encodeDecode(test1);
            encodeDecode(test2);
        }
    
        static void encodeDecode(final String testInputString) throws IOException {
            sun.misc.BASE64Encoder unsupportedEncoder = new sun.misc.BASE64Encoder();
            sun.misc.BASE64Decoder unsupportedDecoder = new sun.misc.BASE64Decoder();
    
            Base64.Encoder mimeEncoder = java.util.Base64.getMimeEncoder();
            Base64.Decoder mimeDecoder = java.util.Base64.getMimeDecoder();
    
            String sunEncoded = unsupportedEncoder.encode(testInputString.getBytes());
            System.out.println("sun.misc encoded: " + sunEncoded);
    
            String mimeEncoded = mimeEncoder.encodeToString(testInputString.getBytes());
            System.out.println("Java 8 Base64 MIME encoded: " + mimeEncoded);
    
            byte[] mimeDecoded = mimeDecoder.decode(sunEncoded);
            String mimeDecodedString = new String(mimeDecoded, Charset.forName("UTF-8"));
    
            byte[] sunDecoded = unsupportedDecoder.decodeBuffer(mimeEncoded); // throws IOException
            String sunDecodedString = new String(sunDecoded, Charset.forName("UTF-8"));
    
            System.out.println(String.format("sun.misc decoded: %s | Java 8 Base64 decoded:  %s", sunDecodedString, mimeDecodedString));
    
            System.out.println("Decoded results are both equal: " + Objects.equals(sunDecodedString, mimeDecodedString));
            System.out.println("Mime decoded result is equal to test input string: " + Objects.equals(testInputString, mimeDecodedString));
            System.out.println("\n");
        }
    }
    
    • Cubic
      Cubic about 8 years
      What do you mean by drop-in replacement? Are you just talking about the encoding/decoding behavior?
    • Ivo Mori
      Ivo Mori about 8 years
      @Cubic: I mean by drop-in replacement that I can switch legacy code using sun.misc.BASE64Encoder and sun.misc.BASE64Decoder to Java 8 MIME Base64 Encoder/Decoder for any existing other client code transparently. This seems to be the case, but I like to have an authoritative reference confirming this or a "proof" that this is not the case, otherwise.
    • jstedfast
      jstedfast about 8 years
      Yes, you can switch the legacy code to the new Java 8 Base64 Encoder/Decoder. They will always produce the same output.
    • Raedwald
      Raedwald almost 5 years
      Relevant for asking which encoder class to use?
    • Ivo Mori
      Ivo Mori almost 4 years
      @Raedwald I don't think so. This question and answer documents the problem when legacy code uses the unofficial Java internal APIs (supposedly-never-to-be-used-by-anyone) sun.misc.BASE64Encoder and sun.misc.BASE64Decoder. This question/answer is about migrating such legacy code to the official Java 8 Base64 APIs. The answer to which encoder class to use already suggests to use the Java 8 Base64 APIs and doesn't point you to those legacy sun.misc APIs.
  • Ivo Mori
    Ivo Mori about 8 years
    Up-voting your explanation as it makes perfectly sense and it also corresponds with what I figured out from the start. Still hoping for some "official" reference - if it even exists. I'd expect that the Java 8 Adaption Guide or the JEP 135 would clearly state that the Java 8 Base64 Encoder/Decoder replace the internal sun.misc.BASE64 implementations. But well, maybe it's just too obvious... Anyway, this QA format becomes then that "official" reference.
  • jstedfast
    jstedfast about 8 years
    It seems that sun.* namespaces should not be used: oracle.com/technetwork/java/faq-sun-packages-142232.html which suggests that the java.util.* base64 classes were added to appease developers who needed base64 support and were having to either implement their own classes or use 3rd party solutions.
  • jstedfast
    jstedfast about 8 years
    You could also take, as official proof, that rfc2045 specifically states that it obsoletes rfc1521.
  • Ivo Mori
    Ivo Mori about 8 years
    Perfect, you found a counterexample!
  • Ivo Mori
    Ivo Mori about 8 years
    I like your approach. However, Stuart's answer includes a counterexample which shows an edge case where the two resulting encodings are not identical.
  • Ivo Mori
    Ivo Mori about 8 years
    Indeed, it would have been really nice if this and any other corner cases (if they exist) were documented properly. Do you know of any other edge cases?
  • Stuart Marks
    Stuart Marks about 8 years
    @IvoMori I'm not aware of any other edge cases, though there probably are some. I doubt they'll be documented. The problem, and this applies to the sun.misc stuff in general, is that it was never formally specified, and there is no suite of conformance and regression tests like there is for the java.* APIs. The sun.misc.BASE64 stuff is just a lump of code that "did what it did" and so it's quite possible or even likely that there are odd edge case behaviors or even bugs lurking there.
  • shareef
    shareef about 5 years
    in android problem occers... Call requires API level 26 (current min is 21): java.util.Base64#getMimeEncoder more... (Ctrl+F1)
  • Christopher Connery
    Christopher Connery almost 5 years
    This comment should be made more visible as org.apache.commons.codec.binary is more versatile than the java.util library
  • Ivo Mori
    Ivo Mori almost 4 years
    I'm sorry, but I fail to see how your answer gives an answer to my question. I asked specifically whether the official Java 8 Base64 API can be used as a drop-in replacement (no change in any behaviour) for legacy code which uses the sun.misc Base64 API. Stuart Marks gave already the final answer by showing a counter example that this isn't the case. Your late answer should/can be the answer for a different question. Instead your late answer simply seems to "hijack" my question to promote the use of a third-party library.
  • Ivo Mori
    Ivo Mori almost 4 years
    Did you actually verify that the org.apache.commons.codec.binary.Base64 API is a drop-in replacement for the sun.misc API? According to you any sun.misc Base64 encoding can be reproduced using the org.apache.commons.codec.binary.Base64 API – right? As people are interested in the Apache Commons Codec library, I'll follow up with a new question which is specific to that third-party library.
  • Ivo Mori
    Ivo Mori almost 4 years
    To follow up on your off-topic answer, I created a separate question: Is Apache Commons Codec Base64 a drop-in replacement for sun.misc.BASE64?. Also note that I wasn't able to confirm your answer here to be correct. Please provide your proof or code example for your claim by answering conclusively Is Apache Commons Codec Base64 a drop-in replacement for sun.misc.BASE64?.
  • jonny99
    jonny99 over 3 years
    My own observation is that sun.misc.BASE64Encoder inserts a newline after every 76 characters, not just the specific case of the encoded string being exactly 76 characters in length. I discovered this while troubleshooting an incompatibility with the PEM_read_bio_RSAPublicKey() function in the OpenSSL library, which is expecting those newlines to be included.
  • madx
    madx almost 3 years
    Base64.encodeBase64String(passwdstring.getBytes()) replaced my new sun.misc.BASE64Encoder().encode(passwdstring.getBytes()) with success
  • JW-Munich
    JW-Munich almost 2 years
    for me I had an issue with a newline at the start of the String before decoding. The sun.misc.BASE64Decoder happily removed newline characters before decoding, the java.util.Base64.Decoder threw an IllegalArgumentException