Python extract data from xml and save it to excel

35,491

Solution 1

The following should work:

import xml.etree.ElementTree as ET
import arcpy

xmlfile = 'D:/Working/Test/Test.xml'
element_tree = ET.parse(xmlfile)
root = element_tree.getroot()
agreement = root.find(".//agreementid").text
arcpy.AddMessage(agreement)

The root.find() call uses an XPath expression (quick cheatsheet is in the Python docs here) to find the first tag at any level under the current level named agreementid. If there are multiple tags named that in your file, you can use root.findall() and iterate over the results. If, for example, there are three fields named agreementid, and you know you want the second one, then root.findall(".//agreementid")[1] should work.

Solution 2

MattDMo has given a sufficient answer to the problem, but I just want to remind you that python has a csv module which makes it easier to write comma separated data, which is typically then read into applications such as databases or spreadsheets.

From the docs:

import csv
with open('eggs.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ',
                        quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
    spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
Share:
35,491
Olga K.
Author by

Olga K.

Updated on January 09, 2020

Comments

  • Olga K.
    Olga K. over 4 years

    I would like to extract some data from an XML file and save it in a table format, such as XLS or DBF.

    Here is XML file i have:

    <?xml version="1.0" encoding="utf-8"?>
    <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
      <SOAP-ENV:Header />
      <SOAP-ENV:Body>
        <ADD_LandIndex_001>
          <CNTROLAREA>
            <BSR>
              <VERB>ADD</VERB>
              <NOUN>LandIndex</NOUN>
              <REVISION>001</REVISION>
            </BSR>
          </CNTROLAREA>
          <DATAAREA>
            <LandIndex>
              <reportId>AMI100031</reportId>
              <requestKey>R3278458</requestKey>
              <SubmittedBy>EN4871</SubmittedBy>
              <submittedOn>2015/01/06 4:20:11 PM</submittedOn>
              <LandIndex>
                <agreementdetail>
                  <agreementid>001       4860</agreementid>
                  <agreementtype>NATURAL GAS</agreementtype>
                  <currentstatus>
                    <status>ACTIVE</status>
                    <statuseffectivedate>1965/02/18</statuseffectivedate>
                    <termdate>1965/02/18</termdate>
                  </currentstatus>
                  <designatedrepresentative>
                  </designatedrepresentative>
                </agreementdetail>
              </LandIndex>
            </LandIndex>
          </DATAAREA>
        </ADD_LandIndex_001>
      </SOAP-ENV:Body>
    </SOAP-ENV:Envelope>
    

    I am interested in information inside the agreementdetail tag which is under DATAAREA/LandIndex/LandIndex/

    UPDATE:

    Thanks to MattDMo this task has moved a bit from its dead point. So I made this script below. It iterates the file and gets all instances of the agreementdetail and outputs agreementid and agreementtype for each.

    import xml.etree.ElementTree as ET
    import arcpy
    
    xmlfile = 'D:/Working/Test/Test.xml'
    element_tree = ET.parse(xmlfile)
    root = element_tree.getroot()
    agreement = root.findall(".//agreementdetail")
    result = []
    elements = ('agreementid', 'agreementtype')
    
    for a in agreement:
        obj = {}
        for e in elements:
            obj[e] = a.find(e).text
        result.append(obj)
    
    arcpy.AddMessage(result)
    

    The output I am receiving consists of a bunch of this strings: {'agreementid': '001 4860', 'agreementtype': 'NATURAL GAS'}

    Now I need to convert this output into a table format (.csv, .dbf, .xls etc.) so that agreementid and agreementtype are columns:

    agreementid    | agreementtype 
    001       4860 | NATURAL GAS
    

    I will be very grateful if you could guide me on how to accomplish it. Or maybe any example?

    P.S. Python version is 2.7