PyYAML dump format

73,288

Solution 1

Below, ruamel.yaml is used instead.

ruamel.yaml is actively maintained. Unlike PyYAML, ruamel.yaml supports:

  • YAML <= 1.2. PyYAML only supports YAML <= 1.1. This is vital, as YAML 1.2 intentionally breaks backward compatibility with YAML 1.1 in several edge cases. This would usually be a bad thing. In this case, this renders YAML 1.2 a strict superset of JSON. Since YAML 1.1 is not a strict superset of JSON, this is a good thing.
  • Roundtrip preservation. When calling yaml.dump() to dump a dictionary loaded by a prior call to yaml.load():
    • PyYAML naively ignores all input formatting – including comments, ordering, quoting, and whitespace. Discarded like so much digital refuse into the nearest available bit bucket.
    • ruamel.yaml cleverly respects all input formatting. Everything. The whole stylistic enchilada. The entire literary shebang. All.

Library Migration

Switching from PyYAML to ruamel.yaml in existing applications is typically as simple as changing the library import to:

from ruamel import yaml

This works because ruamel.yaml is a PyYAML fork that conforms to the PyYAML API.

No other changes should be needed. The yaml.load() and yaml.dump() functions should continue to behave as expected.

Roundtrip Preservation and What It Can Do for You

For backward compatibility with PyYaml, the yaml.load() and yaml.dump() functions do not perform roundtrip preservation by default. To do so, explicitly pass:

  • The optional Loader=ruamel.yaml.RoundTripLoader keyword parameter to yaml.load().
  • The optional Dumper=ruamel.yaml.RoundTripDumper keyword parameter to yaml.dump().

An example kindly "borrowed" from ruamel.yaml documentation:

import ruamel.yaml

inp = """\
# example
name:
  # Yet another Great Duke of Hell. He's not so bad, really.
  family: TheMighty
  given: Ashtaroth
"""

code = ruamel.yaml.load(inp, Loader=ruamel.yaml.RoundTripLoader)
code['name']['given'] = 'Astarte'  # Oh no you didn't.

print(ruamel.yaml.dump(code, Dumper=ruamel.yaml.RoundTripDumper), end='')

It is done. Comments, ordering, quoting, and whitespace will now be preserved intact.

Solution 2

In my case, I want " if value contains a { or a }, otherwise nothing. For example:

 en:
   key1: value is 1
   key2: 'value is {1}'

To perform that, copy function represent_str() from file representer.py in module PyYaml and use another style if string contains { or a }:

def represent_str(self, data):
    tag = None
    style = None
    # Add these two lines:
    if '{' in data or '}' in data:
        style = '"'
    try:
        data = unicode(data, 'ascii')
        tag = u'tag:yaml.org,2002:str'
    except UnicodeDecodeError:
        try:
            data = unicode(data, 'utf-8')
            tag = u'tag:yaml.org,2002:str'
        except UnicodeDecodeError:
            data = data.encode('base64')
            tag = u'tag:yaml.org,2002:binary'
            style = '|'
    return self.represent_scalar(tag, data, style=style)

To use it in your code:

import yaml

def represent_str(self, data):
  ...

yaml.add_representer(str, represent_str)

In this case, no diffences between keys and values and that's enough for me. If you want a different style for keys and values, perform the same thing with function represent_mapping

Solution 3

First

To represent dictionary data is used following code:

mapping = list(mapping.items())
    try:
        mapping = sorted(mapping)
    except TypeError:
        pass

It is why ordering is changed

Second

Information about how scalar type was presented (with double quote or not) is lost when reading (this is principal approach of library)

Summary

You can create own class based on 'Dumper' and to overload method 'represent_mapping' for changing behaviour how dictionary will be presented

For saving information about double quotes for scalar you must also create own class based on 'Loader', but i am afraid that it will affect and other classes and will doing it difficult

Share:
73,288
nicosantangelo
Author by

nicosantangelo

Updated on March 24, 2021

Comments

  • nicosantangelo
    nicosantangelo about 3 years

    I know there are a few questions about this on SO, but I couldn't find what I was looking for.

    I'm using pyyaml to read (.load()) a .yml file, modify or add a key, and then write it (.dump()) again. The problem is that I want to keep the file format post-dump, but it changes.

    For example, I edit the key en.test.index.few to say "Bye" instead of "Hello"

    Python:

    with open(path, 'r', encoding = "utf-8") as yaml_file:
        self.dict = pyyaml.load(yaml_file)
    

    Then, afther changing the key:

    with open(path, 'w', encoding = "utf-8") as yaml_file:
        dump = pyyaml.dump(self.dict, default_flow_style = False, allow_unicode = True, encoding = None)
        yaml_file.write( dump )
    

    Yaml:

    Before:

    en:
      test:
        new: "Bye"
        index:
          few: "Hello"
      anothertest: "Something"
    

    After:

    en:
      anothertest: Something
      test:
        index:
          few: Hello
        new: Bye
    

    Is there a way to keep the same format?, for example the qoutes and order. Am I using the wrong tool for this?

    I know maybe the original file it's not entirely correct, but I have no control over it (it's a Ruby on Rails i18n file).

    Thank you very much.

  • nicosantangelo
    nicosantangelo about 8 years
    I must say this is a wonderful answer. I'm currently not developing the project that used PyYAML but I'll definitely give ruamel.yaml a try when I have some spare time and accept the answer if it works. Thanks!
  • sjmh
    sjmh almost 8 years
    It actually doesn't maintain all formatting. For instance, quotes are not kept.
  • Cecil Curry
    Cecil Curry almost 8 years
    @sjmh If ruamel.yaml fails to roundtrip preserve any formatting (including but not limited to quote style), that's a bug. Consider submitting an issue at ruamel.yaml's issue tracker. The principal maintainer of ruamel.yaml is quite receptive and friendly, thankfully. Cheers!
  • Anthon
    Anthon almost 8 years
    @sjmh Starting with ruamel.yaml 0.11.12 you can specify preserve_quotes=True during loading, which will wrap the strings loaded with information needed for dumping. Also see this answer
  • Alex Harvey
    Alex Harvey over 7 years
    @Anthon Will it somehow preserve blank lines between data?
  • Anthon
    Anthon over 7 years
    @AlexHarvey Yes it will. Empty lines are considered comments without # and are preserved. But in case you'll be inserting new values, beware that the comments/empty lines are currently attached to mapping-keys/sequence-elements before the comment (i.e. comment gathering is greedy), even though visually the might be "closer" to the following key/element
  • Alex Harvey
    Alex Harvey over 7 years
    That's great to hear. I have a bunch of issues and I'll raise them in Bitbucket. Your project is fantastic, it's almost unbelievable no one has thought of it before!
  • AlexLordThorsen
    AlexLordThorsen about 6 years
    Is this comment still up to date with the current state of PyYAML?
  • wim
    wim almost 6 years
    PyYAML has new maintainers now, and had a v4.1 release recently. The answer is outdated and silly content such as "PyYAML is a fetid corpse rotting.." should probably be edited out.
  • Arne
    Arne over 5 years
    @wim aaaand PyYAML development grinded to a halt again, with no meaningful changes since 5 months. The 4.1 release broke many projects, and we are currently sitting on 4.2b4, which is pretty embarrassing. There is little hope that it ever gets a proper release and lots of regret for not choosing ruamel in the first place.
  • wim
    wim over 5 years
    3.x to 4.x is a major version number bump, so backwards incompat changes should be expected. I'm not denying PyYAML maintainership has problems and politics, but the language used in this answer is a bit excessive. It reads like an advertisement for ruamel.yaml or propaganda.
  • Alex H.
    Alex H. over 5 years
    Still dead. Never trust one release, only a pattern of releases. Long live ruamel.yaml
  • gatopeich
    gatopeich about 5 years
    I am afraid the "new" PyYAML is very lacking from the point of view of usability in a Python environment. Some "maintainers" are tinkering around while they admittedly "learn Python". And they fight fiercely against obvious requests like preserving dictionary order just like Python does. It is a pity for such a relevant project.
  • lifeofguenter
    lifeofguenter over 4 years
    while ruamel works better, its also faulty. The state of yaml loading + dumping in python is unfortunately quite bad.
  • Dan Niero
    Dan Niero over 4 years
    ruamel.yaml is significantlly slower than pyyaml. Something to keep in mind
  • Dan Niero
    Dan Niero over 4 years
    This answer should be corrected. It immediately starts with false statements. Maybe they where true at the time it was written, but it is not longer the case. PyYaml is not dead and the web site is up. At the moment of writing this comment, PyYaml appear quite alive and kicking. Look at the latest releases: - 2019-07-30: PyYAML 5.1.2 is released. - 2018-06-06: PyYAML 5.1.1 is released. - 2019-03-13: LibYAML 0.2.2 and PyYAML 5.1 are released. - 2018-07-05: PyYAML 3.13 is released. - 2018-06-24: LibYAML 0.2.1 is released. This answer is misleading.
  • 9769953
    9769953 over 4 years
    The PyYAML source code can now be found at github.com/yaml/pyyaml , and shows quite some activity (in commits, issues & responses, and pull requests & merges). Perhaps these are all bad, but PyYAML is currently not unmaintained and dead.
  • scravy
    scravy over 3 years
    While PyYAML is maintained and works well, the other library advertised here is questionable at best. The link is not even valid anymore, the project migrated from bitbucket to sourceforge and has very little activity over there.
  • Rob
    Rob about 3 years
    I only edited it to remove language that was not appropriate for the SO. Some of it showed terrible lack of understanding of history. I can't endorse the library, or the recommendations in this post.
  • Torsten Bronger
    Torsten Bronger over 2 years
    FWIW, PyYAML managed to convert our Git commit 45433e8 into 4543300000000 and broke our CI. The respective PR github.com/yaml/pyyaml/pull/174 is stalled since 2018. openhub.net/p/… gave us enough confidence to make the switch to ruamel.yaml.