YAML output format of python

10,588

Solution 1

TL;DR: The solution is in the two lines commented "LOOK HERE!" It is possible to deal with YAML as dicts within your program and with an ordering in the stored file/text if you accept that the output will be lists of lists.

If you don't mind horribly ugly explicit types like !!python/ordered_dict or !!omap littering your file then you can go that route as well. My vote goes to !!omap, but I'm unsure how many tools/libs support it (I'm pretty sure fewer tools support !!python/ordered_dict, though). Ultimately you are dealing with two independent sets of data: the dict itself, and a metadata that defines the ordering of the keys.

(There are semi-magical ways of coercing an ordered dict in YAML without the !!python/ordered_dict or !!omap mess everywhere, but they are fragile, contradict the very definition of dictionaries, and will likely break as the underlying YAML library evolves. This situation is identical for JSON, by the way, as YAML is a superset of JSON and neither guarantee the order of keys -- which means the workarounds break the first time a standard-compliant tool/user messes with the file.)

The rest of this post is example/verification code and an explanation of why things are this way.

from __future__ import print_function
import yaml

# Setting up some example data
d = {'name': 'A Project',
     'version': {'major': 1, 'minor': 4, 'patch': 2},
     'add-ons': ['foo', 'bar', 'baz']}

# LOOK HERE!
ordering = ['name', 'version', 'add-ons', 'papayas']
ordered_set = [[x, d[x]] for x in ordering if x in d.keys()]
# In the event you only care about a few keys,
# you can tack the unspecified ones onto the end
# Note that 'papayas' isn't a key. You can establish an ordering that
# includes optional keys by using 'if' as a guard in the list comprehension.

# Demonstration
things = {'unordered.yaml': d, 'ordered.yaml': ordered_set}
for k in things:
    f = open(k, 'w')
    f.write(yaml.dump(things[k], default_flow_style=False, allow_unicode=True))
    f.close()

# Let's check the result
output = []
for k in things:
    f = open(k, 'r')
    output.append(dict(yaml.load(f.read())))
    f.close()

# Should print 'OK'
if output[0] == output[1]:
    print('OK')
else:
    print('Something is wrong')

The files created look like this:

ordered.yaml:

- - name
  - A Project
- - version
  - major: 1
    minor: 4
    patch: 2
- - add-ons
  - - foo
    - bar
    - baz

unordered.yaml:

add-ons:
- foo
- bar
- baz
name: A Project
version:
  major: 1
  minor: 4
  patch: 2

This doesn't produce as pretty a YAML document as you might hope. That said, it can take pretty YAML as initial input (yay!), and scripting the conversion from un-pretty, ordered YAML to pretty, still-ordered, dict-style YAML is straightforward (which I leave as an exercise for you).

If you have an ordering of keys you want preserved, write that into an ordered list/tuple. Use that list to generate an ordered list of lists (not list of tuples, though, because you'll get the !!python/tuple type in YAML, and that sucks). Dump that to YAML. To read it back in read it as normal, then pass that structure to dict() and you're back to the original dictionary you started with. You may have to recursively descend the structure if you have a nested structure which requires its order preserved (this is easier to do in code than to explain in prose -- which is something you probably already know).

In this example I want to have a project 'name' come first in the file, then 'version' number elements, then 'add-ons'. Normally PyYAML orders dictionary keys in alphanumeric order when you call dump(), but this isn't reliable because that might change in the future and there is nothing in the YAML standard that requires this, so I have no guarantee that a different YAML utility will do things this way. 'add-ons' comes before 'name', so I have an ordering problem. So I define my order, then pack an ordered list of lists, and then dump that.

You are asking for order out of something that is inherently unordered. A dictionary is a hash table, internally ordered exclusively for search speed. That order is something you're not supposed to be able to mess with because if a faster way of implementing dictionaries is discovered tomorrow the runtime needs to implement it without breaking everyone's code that depended on dictionaries being a helpful abstraction of a hash table.

In the same way, YAML is not a markup language (after all, it originally stood for "Yaml Ain't a Markup Language"), it is a data format. The difference is important. Some data is ordered, like tuples and lists; some isn't, like bags of key-value pairs (slightly different from a hash table, but conceptually similar).

I use a recursive version of this sort of solution to guarantee YAML output across different YAML implementations, not for human readability, but because I do a lot of data passing in YAML and each record has to be signed with a key, and indefinite order prevents uniform signatures whenever dicts/hashes are in use.

Solution 2

YAML mappings are unordered and so are Python dicts. The official way to read in a file and keep the ordering is to use !!omap but those get converted to tuples in PyYAML and are not as easy to update as dict/ordereddict/OrderedDict.

If you already have a yaml file that you read in and update you can use my ruamel.yaml library that reads in the mappings when used in round-trip mode as ordereddict and writes them out as normal mappings (it also preservers comments).

An example of usage was given as an answer to another question.

Share:
10,588
Samuel
Author by

Samuel

Updated on June 17, 2022

Comments

  • Samuel
    Samuel almost 2 years

    I use PyYaml to output a YAML file. But it reorder my items. like following

    >>> yaml.dump({'3':5, '1':3})
    "{'1': 3, '3': 5}\n"
    

    I want to get "{'3': 5, '1': 3}\n". Can I do that thing

    PS. I have tried the collections.OrderedDict. It's output is not good. Like following

    >>> a= collections.OrderedDict()
    >>> a['3']=1
    >>> a['1']=2
    >>> a['5']=2
    >>> yaml.dump(a)
    "!!python/object/apply:collections.OrderedDict\n- - ['3', 1]\n  - ['1', 2]\n  - ['5', 2]\n"