Convert XML to CSV file

65,984

Do not use the findall function, as it will look for att tags in the whole tree. Just iterate the tree in order from top to bottom and grab the relevant elements in them.

from xml.etree import ElementTree
tree = ElementTree.parse('input.xml')
root = tree.getroot()

for att in root:
    first = att.find('attval').text
    for subatt in att.find('children'):
        second = subatt.find('attval').text
        print('{},{}'.format(first, second))

Which gives:

$ python process.py 
Data,Studyval
Data,Site
Info,age
Info,gender
Share:
65,984
pam
Author by

pam

Updated on September 17, 2020

Comments

  • pam
    pam over 3 years

    I have an XML file like this:

    <hierachy>
        <att>
            <Order>1</Order>
            <attval>Data</attval>
            <children>
                <att>
                    <Order>1</Order>
                    <attval>Studyval</attval>
                </att>
                <att>
                    <Order>2</Order>
                    <attval>Site</attval>
                </att>
            </children>
        </att>
        <att>
            <Order>2</Order>
            <attval>Info</attval>
            <children>
                <att>
                    <Order>1</Order>
                    <attval>age</attval>
                </att>
                <att>
                    <Order>2</Order>
                    <attval>gender</attval>
                </att>
            </children>
        </att>
    </hierachy>
    

    I'm trying to convert it to a CSV file like this:

    Data,Studyval
    Date,Site
    Info,age
    Info,gender
    

    My problem is, both the parent and child names are the same - 'att' and 'attval'. How do I tell Python to distinguish between them both and give me the output?

    I tried this:

    import xml.etree.cElementTree as ET
    
    tree = ET.parse('input.xml')
    rebase = tree.getroot()
    
    list = []
    
    for att in rebase.findall('att'):
            name = att.find('attval').text
            for each_att in att.findall('attval'):
                try:
                    val = att.find('attval').text
                    print name, val
                except AttributeError:
                    print name
    

    and it printed the same things twice.

  • pam
    pam almost 9 years
    That is perfect! Thanks a ton!