Use CDATA to store raw binary streams?

12,927

Solution 1

You can store it as CDATA, but there's the risk that some byte sequences will evaluate to valid XML that closes the CDATA section. After a quick look at http://www.w3.org/TR/2006/REC-xml-20060816/#sec-cdata-sect, it seems you can have any sequence of chars except "]]>". Have a look at what is a valid XML char too.

Solution 2

The Nul character ( '\0' in C ) is not valid anywhere in XML, even as an escape ( & #0; ).

Solution 3

No you can't use CDATA alone to inject binary data in an XML file.

In XML1.0 (because XML 1.1 is more permissive, but not about control chars), the following restrictions apply to CDATA characters:

CData      ::=      (Char* - (Char* ']]>' Char*)) 
Char       ::=      #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

That means there are several characters illegal, among them are:

  • illegal XML control characters 0x00 to 0x20 except new lines, carriage returns and tabs
  • illegal UTF-8 sequences like 0xFF or the non canonical 0b1100000x 0b10xxxxxx

In addition to that, in a standard entity content without CDATA :

  • "<" and ">" use are illegal
  • "&" use is restricted (&eacute; is OK, &zajdalkdza; is not)

So CDATA is just a way to allow "<", ">" and "&", by restricting "]]>" instead. It doesn't solve the illegal XML, Unicode and UTF-8 characters issue which is the main problem.

Solutions:

  1. Use Base64 with 33% overhead but a large support in all programming languages and the fact that it's a standard
  2. Use BaseXML with still limited implementations but 20% overhead only
  3. Don't encode binary data within XML if possible, transfer it separately

Solution 4

XML is a plain-text format - don't use it to store binary data. Put the binary blobs in separate files and add an element to your XML which references these files. If you want to store all binary blobs in a single file, add an offset attribute or something like that...

Share:
12,927
Robin Rodricks
Author by

Robin Rodricks

Updated on June 04, 2022

Comments

  • Robin Rodricks
    Robin Rodricks almost 2 years

    Instead of the overhead with saving binary as Base64, I was wondering if you could directly store double-byte binary streams into XML files, using CDATA, or commenting it out, or something?

  • Robin Rodricks
    Robin Rodricks over 15 years
    I suppose because XML files are null-terminated.
  • Robin Rodricks
    Robin Rodricks over 15 years
    Yeah, and exactly how is Flash supposed to connect to and read raw binary off files? (ActionScript 2 only)
  • Robin Rodricks
    Robin Rodricks over 15 years
    Don't give me the ByteArray or URLLoader talk. This is AS2 only.
  • Christoph
    Christoph over 15 years
    @Jeremy: and where exactly did you state that in your question?
  • Christoph
    Christoph over 15 years
    @Jeremy: They aren't. Null is just not a valid XML character, likely because of null-terminated strings in a popular programming language...
  • Aaron Digulla
    Aaron Digulla over 15 years
    Note that the standard is not 100% clear about this. The character range definitions exclude the 0 byte but some other texts say that any character below 127 is valid.
  • Robin Rodricks
    Robin Rodricks over 15 years
    I didn't because I wanted to be open about AS2/AS3 so hopefully get more answers.
  • David Sykes
    David Sykes about 14 years
    Doesn't that mean no you can't, since 0-8,B,C,E,F FFFE, and FFFF are invalid characters?
  • rwong
    rwong almost 11 years
    Also, Carriage Return are substituted in XML CDATA. stackoverflow.com/questions/1437874/…
  • rwong
    rwong almost 11 years
    Downvoting because there are many byte sequences that won't be preserved. By "not preserving", I mean that it's not possible to recover the original binary data from CDATA-encoded data. See Pete's answer and comments.
  • Pete Kirkham
    Pete Kirkham almost 11 years
    @rwong they are preserved if escaped, but you cannot do that with nul.