Decoding base64-encoded data from xml document

30,792

Solution 1

For a first shot i didn't use any programming language, just Notepad++

I opened the xml file within and copy and pasted the raw base64 content into a new file (without square brackets).

Afterwards I selected everything (Strg-A) and used the option Extensions - Mime Tools - Base64 decode. This threw an error about the wrong text length (must be mod 4). So i just added two equal signs ('=') as placeholder at the end to get the correct length.

Another retry and it decoded successfully into 'something'. Just save the file as .jpg and it opens like a charm in any picture viewer.

So i would say, there IS something wrong with the data you'll get. They just don't have the right numbers of equal signs at the end to fill up to a number of signs which can be break into packets of 4.

The 'easy' way would be to add the equal sign till the decoding doesn't throw an error. The better way would be to count the number of characters (minus CR/LFs!) and add the needed ones in one step.

Further investigations

After some coding and reading of the convert function, the problem is a wrong attaching of a equal sign from the producer. Notepad++ has no problem with tons of equal signs, but the Convert function from MS only works with zero, one or two signs. So if you fill up the already existing one with additional equal signs you get an error too! To get this damn thing to work, you have to cut off all existing signs, calculate how much are needed and add them again.

Just for the bounty, here is my code (not absolute perfect, but enough for a good starting point): ;-)

    static void Main(string[] args)
    {
        var elements = XElement
            .Load("test.xml")
            .XPathSelectElements("//media/media-object[@encoding='base64']");
        foreach (XElement element in elements)
        {
            var image = AnotherDecode64(element.Value);
        }
    }

    static byte[] AnotherDecode64(string base64Decoded)
    {
        string temp = base64Decoded.TrimEnd('=');
        int asciiChars = temp.Length - temp.Count(c => Char.IsWhiteSpace(c));
        switch (asciiChars % 4)
        {
            case 1:
                //This would always produce an exception!!
                //Regardless what (or what not) you attach to your string!
                //Better would be some kind of throw new Exception()
                return new byte[0];
            case 0:
                asciiChars = 0;
                break;
            case 2:
                asciiChars = 2;
                break;
            case 3:
                asciiChars = 1;
                break;
        }
        temp += new String('=', asciiChars);

        return Convert.FromBase64String(temp);
    }

Solution 2

The base64 string is not valid as Oliver has already said, the string length must be multiples of 4 after removing white space characters. If you look at then end of the base64 string (see below) you will see the line is shorter than the rest.

RRRRRRRRRRRRRRRRRRRRRRRRRRRRX//Z=

If you remove this line, your program will work, but the resulting image will have a missing section in the bottom right hand corner. You need to pad this line so the overall string length is corect. From my calculations if you had 3 characters it should work.

RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRX//Z=

Solution 3

remove last 2 characters while image not get proper

public Image Base64ToImage(string base64String)
    {
        // Convert Base64 String to byte[]
        byte[] imageBytes=null;
        bool iscatch=true;
        while(iscatch)
        {
            try 
                {           
         imageBytes = Convert.FromBase64String(base64String);
         iscatch = false;

            }
            catch 
            {
                int length=base64String.Length;
                base64String=base64String.Substring(0,length-2);
            }
        }
        MemoryStream ms = new MemoryStream(imageBytes, 0,
          imageBytes.Length);

        // Convert byte[] to Image
        ms.Write(imageBytes, 0, imageBytes.Length);
        Image image = Image.FromStream(ms, true);
        pictureBox1.Image = image;
        return image;
    }
Share:
30,792
Kjensen
Author by

Kjensen

Updated on September 27, 2020

Comments

  • Kjensen
    Kjensen over 3 years

    I receive some xml-files with embedded base64-encoded images, that I need to decode and save as files.

    An unmodified (other than zipped) example of such a file can be downloaded below:

    20091123-125320.zip (60KB)

    However, I get errors like "Invalid length for a Base-64 char array" and "Invalid character in a Base-64 string". I marked the line in the code where I get the error in the code.

    A file could look like this:

    <?xml version="1.0" encoding="windows-1252"?>
    <mediafiles>
        <media media-type="image">
          <media-reference mime-type="image/jpeg"/>
          <media-object encoding="base64"><![CDATA[/9j/4AAQ[...snip...]P4Vm9zOR//Z=]]></media-object>
          <media.caption>What up</media.caption>
        </media>
    </mediafiles>
    

    And the code to process like this:

    var xd = new XmlDocument();
    xd.Load(filename);
    var nodes = xd.GetElementsByTagName("media");
    
    foreach (XmlNode node in nodes)
            {
                var mediaObjectNode = node.SelectSingleNode("media-object");
                //The line below is where the errors occur
                byte[] imageBytes = Convert.FromBase64String(mediaObjectNode.InnerText);
                //Do stuff with the bytearray to save the image
            }
    

    The xml-data is from an enterprise newspaper system, so I am pretty sure the files are ok - and there must be something in the way I process them, that is just wrong. Maybe a problem with the encoding?

    I have tried writing out the contents of mediaObjectNode.InnerText, and it is the base64 encoded data - so the navigating the xml-doc is not the issue.

    I have been googling, binging, stackoverflowing and crying - and found no solution... Help!

    Edit:

    Added an actual example file (and a bounty). PLease note the downloadable file is in a bit different schema, since I simplified it in the above example, removing irrelevant stuff...

  • Kjensen
    Kjensen over 14 years
    I tried saving the contents of mediaObjectNode.InnerText to a text.file (after outputting it to a console), and no cdata-stuff is included. I tried your suggestion anyway, but it makes no difference.
  • Kjensen
    Kjensen over 14 years
    Well - maybe there is an error there, but this is how I get the file (with this encoding). How can I check if it is the correct one?
  • IanW
    IanW about 12 years
    @Oliver, I know this is an old answer but I was struggling with this problem. Turns out it was because there were three "=" signs in there. Who knew? Thanks!