C# base64 encoding/decoding with serialization of objects issue

14,413

Solution 1

The file declares itself as UTF-8 - so why are you using ASCII to encode it into binary? There are many characters in UTF-8 which can't be represented in ASCII. Do you even have to have the file in text form in-memory to start with? Why not just load it as binary data to start with (e.g. File.ReadAllBytes)?

If you do need to start with a string, use Encoding.UTF-8 (or Encoding.Unicode, although that will probably result in a bigger byte array) and everything should be fine. That extra character is a byte order mark - which can't be represented in ASCII, hence the "?" replacement character.

Solution 2

At a guess ? represents the Byte-Order-Marker which is a character that cannot be represented in ASCII. Why are you not using the UTF-8 encoding?

byte[] toEncodeAsBytes = System.Text.Encoding.UTF8.GetBytes(toEncode);
Share:
14,413
MysticEarth
Author by

MysticEarth

Webdeveloper and -designer.

Updated on June 14, 2022

Comments

  • MysticEarth
    MysticEarth almost 2 years

    I'm using serialization and deserialization in C# for my Project (which is a Class). They are serialized and saved to an XML file. When loading the Project, all goes well.

    Now I'm trying to encode the serialized Project to Base64 and then save the file, which goes well too. The first line of the file (before encoded!) looks like this:

    <?xml version="1.0" encoding="utf-8"?>
      <Project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    

    When I decode the file, there's a ? added in front of the line:

    ?<?xml version="1.0" encoding="utf-8"?>
      <Project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    

    The code I use to encode:

    byte[] toEncodeAsBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(toEncode);
            string returnValue = System.Convert.ToBase64String(toEncodeAsBytes);
            return returnValue;
    

    And the code for decoding:

    byte[] encodedDataAsBytes = System.Convert.FromBase64String(encodedData);
            string returnValue = System.Text.ASCIIEncoding.ASCII.GetString(encodedDataAsBytes);
            return returnValue;
    

    What can this be and how can I fix this?