What usable alternatives to XML syntax do you know?

27,643

Solution 1

YAML is a 100% superset of JSON, so it doesn't make sense to reject YAML and then consider JSON instead. YAML does everything JSON does, but YAML gives so much more too (like references).

I can't think of anything XML can do that YAML can't, except to validate a document with a DTD, which in my experience has never been worth the overhead. But YAML is so much faster and easier to type and read than XML.

As for attributes or properties, if you think about it, they don't truly "add" anything... it's just a notational shortcut to write something as an attribute of the node instead of putting it in its own child node. But if you like that convenience, you can often emulate it with YAML's inline lists/hashes. Eg:

<!-- XML -->
<Director name="Spielberg">
    <Movies>
        <Movie title="Jaws" year="1975"/>
        <Movie title="E.T." year="1982"/>
    </Movies>
</Director>


# YAML
Director: 
    name: Spielberg
    Movies:
      - Movie: {title: E.T., year: 1975}
      - Movie: {title: Jaws, year: 1982}

For me, the luxury of not having to write each node tag twice, combined with the freedom from all the angle-bracket litter makes YAML a preferred choice. I also actually like the lack of formal tag attributes, as that always seemed to me like a gray area of XML that needlessly introduced two sets of syntax (both when writing and traversing) for essentially the same concept. YAML does away with that confusion altogether.

Solution 2

JSON is a very good alternative, and there are tools for it in multiple languages. And it's really easy to use in web clients, as it is native javascript.

Solution 3

TL;DR

Prolog wasn't mentioned here, but it is the best format I know of for representing data. Prolog programs, essentially, describe databases, with complex relationships between entities. Prolog is dead-simple to parse, whose probably only rival is S-expressions in this domain.

Full version

Programmers often "forget" what XML actually consists of. Usually referring to a very small subset of what it is. XML is a very complex format, with at least these parts: DTD schema language, XSD schema language, XSLT transformation language, RNG schema language and XPath (plus XQuery) languages - they all are part and parcel of XML standard. Plus, there are some apocrypha like E4X. Each and every one of them having their own versions, quite a bit of overlap, incompatibilities etc. Very few XML parsers in the wild implement all of them. Not to mention the multiple quirks and bugs of the popular parses, some leading to notable security issues like https://en.wikipedia.org/wiki/XML_external_entity_attack .

Therefore, looking for an XML alternative is not a very good idea. You probably don't want to deal with the likes of XML at all.

YAML is, probably, the second worst option. It's not as big as XML, but it was also designed in an attempt to cover all bases... more than ten times each... in different and unique ways nobody could ever conceive of. I'm yet to hear about a properly working YAML parser. Ruby, the language that uses YAML a lot, had famously screwed up because of it. All YAML parsers I've seen to date are copies of libyaml, which is itself a hand-written (not a generated from a formal description) kind of parser, with a code which is very difficult to verify for correctness (functions that span hundreds of lines with convoluted control flow). As was already mentioned, it completely contains JSON in it... on top of a handful of Unicode coding techniques... inside the same document, and probably a bunch of other stuff you don't want to hear about.

JSON, on the other hand, is completely unlike the other two. You can probably write a JSON parser while waiting for downloading JSON parser artefact from your Maven Nexus. It can do very little, but at least you know what it's capable of. No surprises. (Except some discrepancies related to character escaping in strings and doubles encoding). No covert exploits. You cannot write comments in it. Multiline strings look bad. Whatever you mean by distinction between properties and attributes you can implement by more nested dictionaries.

Suppose, though you wanted to right what XML wronged... well, then the popular stuff like YAML or JSON won't do it. Somehow fashion and rational thinking parted ways in programming some time in the mid seventies. So, you'll have to go back to where it all began with McCarthy, Hoare, Codd and Kowalski, figure out what is it you are trying to represent, and then see what's the best representation technique there is for whatever is that you are trying to represent :)

Solution 4

I have found S-Expressions to be a great way to represent structured data. It's a very simple format which is easy to generate and parse. It doesn't support attributes, but like YAML & JSON, it doesn't need to. Attributes are simply a way for XML to limit verbosity. Simpler, cleaner formats just don't need them.

Solution 5

Jeff wrote about this here and here. That should help you get started.

Share:
27,643
aku
Author by

aku

I am an avid full-stack software developer/manager who was lucky enough to work on a broad range of projects and technologies – from embedded software to modern web applications.

Updated on December 26, 2020

Comments

  • aku
    aku over 3 years

    For me usable means that:

    • it's being used in real-wold
    • it has tools support. (at least some simple editor)
    • it has human readable syntax (no angle brackets please)

    Also I want it to be as close to XML as possible, i.e. there must be support for attributes as well as for properties. So, no YAML please. Currently, only one matching language comes to my mind - JSON. Do you know any other alternatives?