javax.xml.bind's Base64 encoder/decoder eats last two characters of string

21,038

Solution 1

hello is not a base64 String, so the parsing fails. You must convert the string into a byte array (try String(text).getBytes('UTF-8')) and then call DC.printBase64Binary() on the byte array to get the data in Base64.

DC.parseBase64Binary() will then convert this Base64 encoded data back into the byte array (which you can then convert back into a string).

Solution 2

A few findings after spending time resolving a similar problem on a GAE platform (Base64 decoder eats last (two) characters when decoding a base64-string from facebook)

If the encoded string is not of a 4*n length then the method DatatypeConverter.parseBase64Binary might drop some trailing characters (rendering the JSON payload syntactically wrong). My solution was to add the following code:

while (payload.length() % 4 != 0) payload += "=";

With regards to the code example in the question, I would suggest a change where the test string gets first encoded and then decoded, ie:

return DC.parseBase64Binary(DC.printBase64Binary(String(text).getBytes()))

Solution 3

You're not giving it complete base64 (including final padding) etc to start with. If you give it a complete base64 string, it should be fine.

You should only try to interpret data as if it's base64 if it really is base64 to start with. Doing it with arbitrary character sequences is a bad idea.

It's unclear what you're really trying to do, if you're not actually starting with base64 data. You talk about "converting some strings" - are they base64 or not?

Share:
21,038
tsm
Author by

tsm

Full-time college CS student currently doing hacking for everpurse.com and mobcart.com. Former intern at Vivisimo, an IBM company. Professional experience with Java (+ Swing) (and now Ruby/Rails and PHP by the seat of my pants). Academic experience with SML and C. Personal hacking done in Common Lisp and Python (+ PyGame).

Updated on November 06, 2020

Comments

  • tsm
    tsm over 3 years

    I need to convert some strings using Base64 encoding, and was delighted to see that I didn't have to roll my own converter--Java provides one with javax.xml.bind.DataConverter. However, it has some problems. Here's the output of my time with a Jython REPL:

    >>> import javax.xml.bind.DatatypeConverter as DC
    >>> import java.lang.String as String
    >>> def foo(text):
    ...   return DC.printBase64Binary(DC.parseBase64Binary(String(text)))
    ... 
    >>> foo("hello")
    'hell'
    >>> foo("This, it's a punctuated sentence.")
    'Thisitsapunctuatedsenten'
    >>> foo("\"foo\" \"bar\"")
    'foob'
    >>> foo("\"foo\" \"bar\"12")
    'foobar12'
    >>> foo("\"foo\" \"bar\"1")
    'foob'
    

    As you can see, it doesn't handle non-alphanumeric characters at all, and also frequently--but not always--truncates the string by two characters.

    I guess it might be time to just write my own class, but now I'm bothered that either a) I'm failing at reading the javadoc or something b) The class doesn't work as expected.

    So any help is much appreciated; thanks in advance.