How to parse a YAML file with multiple documents?
Solution 1
The error message is quite specific that a document needs to start with a document start marker. Your first document doesn't have such a marker, although it has a document end marker. After you explicitly end the first document with ...
you can no longer use a document without document boundary markers in PyYAML, you explicitly have to start it with ---
:
The end of your file should look like:
kind: UndergroundDistributionLineSegment
...
---
ingests:
- timestamp: 1970-01-01T00:00:00.000Z
id: OverheadDistributionLineSegment_31168454
You can leave out the explicit document start marker from the first document, but you need to include a start marker for every following document. Document end markers are optional.
If you don't have complete control over the input, using .load_all()
is not safe. There normally is no reason to take that risk and you should be using .safe_load_all()
and extend the SafeLoader
to handle any specific tags that your YAML might contain.
Apart from that you should start your YAML documents with an explicit version directive before the document start indicator (which you should also add to the first document):
%YAML 1.1
---
This is for the benefit of future editors of your YAML files, because you are using PyYAML, which only supports (most of) YAML 1.1 and not the YAML 1.2 specification (form 2009). The alternative is of course to upgrade your YAML parser to e.g ruamel.yaml, which would also have warned you about your use of the unsafe load_all()
(disclaimer: I am the author of that parser). ruamel.yaml
doesn't allow you to have a bare document after an explicit end-of-document marker (which is allowed as @flyx pointed out), which is a bug.
Solution 2
I think you have an invalid yaml
Look at the second document in the sample it begins with a ... instead of ---
...
ingests:
- timestamp: 1970-01-01T00:00:00.000Z
id: OverheadDistributionLineSegment_31168454
Comments
-
BigBoy1337 almost 2 years
Here is my parsing code:
import yaml def yaml_as_python(val): """Convert YAML to dict""" try: return yaml.load_all(val) except yaml.YAMLError as exc: return exc with open('circuits-small.yaml','r') as input_file: results = yaml_as_python(input_file) print results for value in results: print value
Here is a sample of the file:
ingests: - timestamp: 1970-01-01T00:00:00.000Z id: SwitchBank_35496721 attrs: Feeder: Line_928 Switch.normalOpen: 'true' IdentifiedObject.description: SwitchBank IdentifiedObject.mRID: SwitchBank_35496721 PowerSystemResource.circuit: '928' IdentifiedObject.name: SwitchBank_35496721 IdentifiedObject.aliasName: SwitchBank_35496721 loc: vector [43.05292, -76.126800000000003, 0.0] kind: SwitchBank - timestamp: 1970-01-01T00:00:00.000Z id: UndergroundDistributionLineSegment_34862802 attrs: Feeder: Line_928 status: de-energized IdentifiedObject.description: UndergroundDistributionLineSegment IdentifiedObject.mRID: UndergroundDistributionLineSegment_34862802 PowerSystemResource.circuit: '928' IdentifiedObject.name: UndergroundDistributionLineSegment_34862802 path: - vector [43.052942000000002, -76.126716000000002, 0.0] - vector [43.052585000000001, -76.126515999999995, 0.0] kind: UndergroundDistributionLineSegment - timestamp: 1970-01-01T00:00:00.000Z id: UndergroundDistributionLineSegment_34806014 attrs: Feeder: Line_928 status: de-energized IdentifiedObject.description: UndergroundDistributionLineSegment IdentifiedObject.mRID: UndergroundDistributionLineSegment_34806014 PowerSystemResource.circuit: '928' IdentifiedObject.name: UndergroundDistributionLineSegment_34806014 path: - vector [43.05292, -76.126800000000003, 0.0] - vector [43.052928999999999, -76.126766000000003, 0.0] - vector [43.052942000000002, -76.126716000000002, 0.0] kind: UndergroundDistributionLineSegment ... ingests: - timestamp: 1970-01-01T00:00:00.000Z id: OverheadDistributionLineSegment_31168454
In the traceback, note that it starts having a problem at the ...
Traceback (most recent call last): File "convert.py", line 29, in <module> for value in results: File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/__init__.py", line 82, in load_all while loader.check_data(): File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/constructor.py", line 28, in check_data return self.check_node() File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/composer.py", line 18, in check_node if self.check_event(StreamStartEvent): File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/parser.py", line 98, in check_event self.current_event = self.state() File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/parser.py", line 174, in parse_document_start self.peek_token().start_mark) yaml.parser.ParserError: expected '<document start>', but found '<block mapping start>' in "circuits-small.yaml", line 42, column 1
What I would like is for it to parse each of these documents as a separate object, perhaps all of them in the same list, or pretty much anything else that would work with the PyYAML module. I believe the
...
is actually valid YAML so I am surprised that it doesn't handle it automatically. -
flyx about 7 years
...
ends the previous document. The scalaringests
then starts a new document implicitly. Using---
instead would also work, because that explicitly starts a new document, while it implicitly ends the previous document. -
flyx about 7 yearsAddendum: That's only valid for YAML 1.2. In YAML 1.1, you indeed need a
---
. -
flyx about 7 yearsYou should change your links to lead to the YAML 1.1 specification, because in YAML 1.2, it is perfectly valid to have an implicit document after a document suffix. And Example 9.3, which you linked, in the 1.2 spec directly shows that.
-
Anthon about 7 years@flyx thanks for pointing that out. Updated the answer, fixing ruamel.yaml to conform to that takes a bit more. I think you can argue that in YAML 1.1 this is not needed either ('A line beginning with "---" may be used to explicitly denote the beginning of a new YAML document' (emphasis mine)').
-
flyx about 7 yearsThe relevant production in YAML 1.1 is
l-yaml-stream
, which captures all documents after the first one as l-next-document, which resolves to anl-explicit-document
, and that must start with---
.