Python: namespaces in xml ElementTree (or lxml)
Solution 1
Have a look at the lxml tutorial section on namespaces. Also this article about namespaces in ElementTree.
Problem 1: Put up with it, like everybody else does. Instead of "%(ns)Event" % {'ns':NS }
try NS+"Event"
.
Problem 2: By default, the XML declaration is written only if it is required. You can force it (lxml only) by using xml_declaration=True
in your write()
call.
Problem 3: The nsmap
arg appears to be lxml-only. AFAICT it needs a MAPping, not a string. Try nsmap={None: NS}
. The effbot article has a section describing a workaround for this.
Solution 2
To answer your questions in order:
you can't just ignore the namespace, not in the path syntax that
.findall()
uses , but not in "real" xpath (supported by lxml) either: there you'd still be forced to use a prefix, and still need to provide some prefix-to-uri mapping.use
xml_declaration=True
as well asencoding='utf-8'
with the.write()
call (available in lxml, but in stdlib xml.etree only since python 2.7 I believe)I believe lxml will do behave like you want
Hellnar
Updated on May 28, 2020Comments
-
Hellnar almost 4 years
I want to retrieve a legacy xml file, manipulate and save it.
Here is my code:
from xml.etree import cElementTree as ET NS = "{http://www.somedomain.com/XI/Traffic/10}" def fix_xml(filename): f = ET.parse(filename) root = f.getroot() eventlist = root.findall("%(ns)Event" % {'ns':NS }) xpath = "%(ns)sEventDetail/%(ns)sEventDescription" % {'ns':NS } for event in eventlist: desc = event.find(xpath) desc.text = desc.text.upper() # do some editting to the text. ET.ElementTree(root, nsmap=NS).write("out.xml", encoding="utf-8") shorten_xml("test.xml")
The file I load contains:
xmlns="http://www.somedomain.com/XI/Traffic/10" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.somedomain.com/XI/Traffic/10 10.xds"
at the root tag.
I have the following problems, related to namespace:
- As you see, for each tag call, I have give the namespace at the begining to retreive a child.
- Generated xml file doesn't have
<?xml version="1.0" encoding="utf-8"?>
at the begining. - The tags at the output contains such
<ns0:eventDescription>
while I need output as the original<eventDescription>
, without namespace at the begining.
How can these be solved?