Strange Base64 encode/decode problem

52,957

Solution 1

Whatever populates params expects the request to be a URL-encoded form (specifically, application/x-www-form-urlencoded, where "+" means space), but you didn't URL-encode it. I don't know what functions your language provides, but in pseudo code, queryString should be constructed from

concat(uri_escape("data"), "=", uri_escape(base64_encode(rawBytes)))

which simplifies to

concat("data=", uri_escape(base64_encode(rawBytes)))

The "+" characters will be replaced with "%2B".

Solution 2

You have to use a special base64encode which is also url-safe. The problem is that standard base64encode includes +, / and = characters which are replaced by the percent-encoded version.

http://en.wikipedia.org/wiki/Base64#URL_applications

I'm using the following code in php:

    /**
     * Custom base64 encoding. Replace unsafe url chars
     *
     * @param string $val
     * @return string
     */
    static function base64_url_encode($val) {

        return strtr(base64_encode($val), '+/=', '-_,');

    }

    /**
     * Custom base64 decode. Replace custom url safe values with normal
     * base64 characters before decoding.
     *
     * @param string $val
     * @return string
     */
    static function base64_url_decode($val) {

        return base64_decode(strtr($val, '-_,', '+/='));

    }

Solution 3

Because it is a parameter to a POST you must URL encode the data.

See http://en.wikipedia.org/wiki/Percent-encoding

Solution 4

paraquote from the wikipedia link

The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20"

another hidden pitfall everyday web developers like myself know little about

Share:
52,957
Rich Sadowsky
Author by

Rich Sadowsky

I am a seasoned software executive with 30+ years professional experience. I've been Director of Emerging Technology for Symantec, CTO of numerous startups, and inventor of revolutionary technologies. I write code in C++, Python, NodeJS, Java/Groovy/Grails, and assorted other languages. I have a special interest in keeping users secure and their info private. If given just two words to sum up my skills, I'd say that I am an "emerging technologist." I am currently Chief Technology Officer at Waverley Software. We are a software development firm with offices in Ukraine, Bolivia, and Vietnam. I was a delighted Waverley customer at Jibo and Plannuh prior to joining Waverley in Dec 2020. I was Head of Security & Privacy for Jibo, Inc. Previously I was Head of Server development for Jibo. I've been at Jibo for almost 4 years. I was the 7th employee. In my early days I wrote node.js and C++ code that runs on servers and embedded code on the robot. I have also assisted with manufacturing test code that validates robots and components on the manufacturing line and contributed significant Python code and redesigned the networking infrastructure at contract manufacturing for Jibo to ensure security and stability. Now I spend most of my time on security and privacy issues. This keeps me in touch with just about every corner of the organization. I have been traveling to Asia to fulfill certain duties that require an on-site presence where the robot is being manufactured. I started out in the film and music business doing software development, moved to TurboPower Software where I had the chance to work side-by-side with Borland on Turbo Pascal and C++ while creating truly innovative programming tools such as swappable TSRs that allowed you to write entire programs in Turbo Pascal while only a 6 kilobyte portion remained resident and swapped in your code after saving state when hotkey was pressed. Then I saw the promise of the Internet as a software distribution mechanism while most people were still arguing why anyone would need the Internet in the home and created Symantec's LiveUpdate. Next I created compelling, interactive, high quality audio radio programming before the turn of the century including innovations such as embedded metadata in audio stream and calls to action in the player while CTO of RadioCentral. I'm try to humble myself and stop now. But I really do see technology 5 years ahead of the bulk of the industry. This is a good skill because sometimes it takes 3-5 years to get the technology into a compelling form for customers. I live for the risk, thrill, and challenges of this type of work. You could say I have a high risk tolerance.

Updated on July 09, 2022

Comments

  • Rich Sadowsky
    Rich Sadowsky almost 2 years

    I'm using Grails 1.3.7. I have some code that uses the built-in base64Encode function and base64Decode function. It all works fine in simple test cases where I encode some binary data and then decode the resulting string and write it to a new file. In this case the files are identical.

    But then I wrote a web service that took the base64 encoded data as a parameter in a POST call. Although the length of the base64 data is identical to the string I passed into the function, the contents of the base64 data are being modified. I spend DAYS debugging this and finally wrote a test controller that passed the data in base64 to post and also took the name of a local file with the correct base64 encoded data, as in:

    data=AAA-base-64-data...&testFilename=/name/of/file/with/base64data
    

    Within the test function I compared every byte in the incoming data parameter with the appropriate byte in the test file. I found that somehow every "+" character in the input data parameter had been replaced with a " " (space, ordinal ascii 32). Huh? What could have done that?

    To be sure I was correct, I added a line that said:

    data = data.replaceAll(' ', '+')
    

    and sure enough the data decoded exactly right. I tried it with arbitrarily long binary files and it now works every time. But I can't figure out for the life of me what would be modifying the data parameter in the post to convert the ord(43) character to ord(32)? I know that the plus sign is one of the 2 somewhat platform dependent characters in the base64 spec, but given that I am doing the encoding and decoding on the same machine for now I am super puzzled what caused this. Sure I have a "fix" since I can make it work, but I am nervous about "fixes" that I don't understand.

    The code is too big to post here, but I get the base64 encoding like so:

    def inputFile = new File(inputFilename)
    def rawData =  inputFile.getBytes()
    def encoded = rawData.encodeBase64().toString()
    

    I then write that encoded string out to new a file so I can use it for testing later. If I load that file back in as so I get the same rawData:

    def encodedFile = new File(encodedFilename)
    String encoded = encodedFile.getText()
    byte[] rawData = encoded.decodeBase64()
    

    So all that is good. Now assume I take the "encoded" variable and add it to a param to a POST function like so:

    String queryString = "data=$encoded"
    String url = "http://localhost:8080/some_web_service"
    
    def results = urlPost(url, queryString)
    
    def urlPost(String urlString, String queryString) {
        def url = new URL(urlString)
        def connection = url.openConnection()
        connection.setRequestMethod("POST")
        connection.doOutput = true
    
        def writer = new OutputStreamWriter(connection.outputStream)
        writer.write(queryString)
        writer.flush()
        writer.close()
        connection.connect()
    
        return (connection.responseCode == 200) ? connection.content.text : "error                         $connection.responseCode, $connection.responseMessage"
    }
    

    on the web service side, in the controller I get the parameter like so:

    String data = params?.data
    println "incoming data parameter has length of ${data.size()}" //confirm right size
    
    //unless I run the following line, the data does not decode to the same source
    data = data.replaceAll(' ', '+')
    
    //as long as I replace spaces with plus, this decodes correctly, why?
    byte[] bytedata = data.decodeBase64()
    

    Sorry for the long rant, but I'd really love to understand why I had to do the "replace space with plus sign" to get this to decode correctly. Is there some problem with the plus sign being used in a request parameter?

  • Sean the Bean
    Sean the Bean over 6 years
    Can you explain what you mean by "url encode... doesn't work properly"? AFAIK, it works just fine. (The only complaint others seem to have about it is that it creates slightly longer strings due to replacing = with %3D, + with %2B, and / with %2F.)
  • Polak
    Polak over 6 years
    Sure, sorry, I meant you can't use ONLY base64 encode/decode for urls due to that issue with '+', '/' and '=' symbols which must be replaced. That's the trick.