lxml (or lxml.html): print tree structure
Maybe just run some XSLT over the source XML to strip everything but the tags, it's then easy enough to use etree.tostring
to get a string you could hash...
from lxml import etree as ET
def pp(e):
print ET.tostring(e, pretty_print=True)
print
root = ET.XML("""\
<project id="8dce5d94-4273-47ef-8d1b-0c7882f91caa" kpf_version="4">
<livefolder id="8744bc67-1b9e-443d-ba9f-96e1d0007ba8" idref="707cd68a-33b5-4051-9e40-8ba686c2fdb8">Mooo</livefolder>
<livefolder id="8744bc67-1b9e-443d-ba9f" idref="707cd68a-33b5-4051-9e40-8ba686c2fdb8" />
<preference-set idref="8dce5d94-4273-47ef-8d1b-0c7882f91caa">
<boolean id="import_live">0</boolean>
</preference-set>
</project>
""")
pp(root)
xslt = ET.XML("""\
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
""")
tr = ET.XSLT(xslt)
doc2 = tr(root)
root2 = doc2.getroot()
pp(root2)
Gives you the output:
<project id="8dce5d94-4273-47ef-8d1b-0c7882f91caa" kpf_version="4">
<livefolder id="8744bc67-1b9e-443d-ba9f-96e1d0007ba8" idref="707cd68a-33b5-4051-9e40-8ba686c2fdb8">Mooo</livefolder>
<livefolder id="8744bc67-1b9e-443d-ba9f" idref="707cd68a-33b5-4051-9e40-8ba686c2fdb8"/>
<preference-set idref="8dce5d94-4273-47ef-8d1b-0c7882f91caa">
<boolean id="import_live">0</boolean>
</preference-set>
</project>
<project>
<livefolder/>
<livefolder/>
<preference-set>
<boolean/>
</preference-set>
</project>
Related videos on Youtube
Comments
-
lajarre over 1 year
I'd like to print out the tree structure of an etree (formed from an html document) in a differentiable way (means that two etrees should print out differently).
What I mean by structure is the "shape" of the tree, which basically means all the tags but no attribute and no text content.
Any idea? Is there something in lxml to do that?
If not, I guess I have to iterate through the whole tree and construct a string from that. Any idea how to represent the tree in a compact way? (the "compact" feature is less relevant)
FYI it is not intended to be looked at, but to be stored and hashed to be able to make differences between several html templates.
Thanks
-
kindall over 11 yearsIs there something that the
.tostring()
method isn't doing for you? -
Fred Foo over 11 yearsI don't think LXML has this functionality built-in, so you'll have to walk the tree.
-
-
lajarre over 11 yearsPrecisely I didn't know much about XSLT and it seems to be the right and standard way of doing what I want
-
spiralx over 11 yearsOnce you get into the habit of it then it's really useful for anything where you start with lots of structure and want to turn it into something more manageable. Just remember the default rules are the same as this stylesheet - pastebin.com/b3WHMjPx - so it copies elements and attributes, but nothing else.
-
spiralx over 11 yearsThis place has a very good tutorial, and even better reference material for all things XML: zvon.org/comp/m/tutorial.html