Using XPath in ElementTree

64,852

Solution 1

There are 2 problems that you have.

1) element contains only the root element, not recursively the whole document. It is of type Element not ElementTree.

2) Your search string needs to use namespaces if you keep the namespace in the XML.

To fix problem #1:

You need to change:

element = ET.parse(fp).getroot()

to:

element = ET.parse(fp)

To fix problem #2:

You can take off the xmlns from the XML document so it looks like this:

<?xml version="1.0"?>
<ItemSearchResponse>
  <Items>
    <Item>
      <ItemAttributes>
        <ListPrice>
          <Amount>2260</Amount>
        </ListPrice>
      </ItemAttributes>
      <Offers>
        <Offer>
          <OfferListing>
            <Price>
              <Amount>1853</Amount>
            </Price>
          </OfferListing>
        </Offer>
      </Offers>
    </Item>
  </Items>
</ItemSearchResponse>

With this document you can use the following search string:

e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')

The full code:

from elementtree import ElementTree as ET
fp = open("output.xml","r")
element = ET.parse(fp)
e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
for i in e:
  print i.text

Alternate fix to problem #2:

Otherwise you need to specify the xmlns inside the srearch string for each element.

The full code:

from elementtree import ElementTree as ET
fp = open("output.xml","r")
element = ET.parse(fp)

namespace = "{http://webservices.amazon.com/AWSECommerceService/2008-08-19}"
e = element.findall('{0}Items/{0}Item/{0}ItemAttributes/{0}ListPrice/{0}Amount'.format(namespace))
for i in e:
    print i.text

Both print:

2260

Solution 2

from xml.etree import ElementTree as ET
tree = ET.parse("output.xml")
namespace = tree.getroot().tag[1:].split("}")[0]
amount = tree.find(".//{%s}Amount" % namespace).text

Also, consider using lxml. It's way faster.

from lxml import ElementTree as ET

Solution 3

Element tree uses namespaces so all the elements in your xml have name like {http://webservices.amazon.com/AWSECommerceService/2008-08-19}Items

So make the search include the namespace e.g.

search = '{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Items/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Item/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ItemAttributes/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ListPrice/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Amount'
element.findall( search )

gives the element corresponding to 2260

Solution 4

I ended up stripping out the xmlns from the raw xml like that:

def strip_ns(xml_string):
    return re.sub('xmlns="[^"]+"', '', xml_string)

Obviously be very careful with this, but it worked well for me.

Solution 5

One of the most straight forward approach and works even with python 3.0 and other versions is like below:

It just takes the root and starts getting into it till we get the specified "Amount" tag

 from xml.etree import ElementTree as ET
 tree = ET.parse('output.xml')
 root = tree.getroot()
 #print(root)
 e = root.find(".//{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Amount")
 print(e.text)
Share:
64,852

Related videos on Youtube

Ryan R. Rosario
Author by

Ryan R. Rosario

Ph.D., Machine Learning (Statistics), UCLA Alum: Machine Learning Engineer at Facebook I wear many hats, the largest being Computer Scientist hat. My languages: R, Python, C++, Java Applications: text mining, data mining, machine learning.

Updated on March 31, 2021

Comments

  • Ryan R. Rosario
    Ryan R. Rosario about 3 years

    My XML file looks like the following:

    <?xml version="1.0"?>
    <ItemSearchResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2008-08-19">
      <Items>
        <Item>
          <ItemAttributes>
            <ListPrice>
              <Amount>2260</Amount>
            </ListPrice>
          </ItemAttributes>
          <Offers>
            <Offer>
              <OfferListing>
                <Price>
                  <Amount>1853</Amount>
                </Price>
              </OfferListing>
            </Offer>
          </Offers>
        </Item>
      </Items>
    </ItemSearchResponse>
    

    All I want to do is extract the ListPrice.

    This is the code I am using:

    >> from elementtree import ElementTree as ET
    >> fp = open("output.xml","r")
    >> element = ET.parse(fp).getroot()
    >> e = element.findall('ItemSearchResponse/Items/Item/ItemAttributes/ListPrice/Amount')
    >> for i in e:
    >>    print i.text
    >>
    >> e
    >>
    

    Absolutely no output. I also tried

    >> e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
    

    No difference.

    What am I doing wrong?

  • Ryan R. Rosario
    Ryan R. Rosario over 14 years
    Thank you so much. Was about to bang my head against a wall repeatedly.
  • Brian R. Bondy
    Brian R. Bondy over 14 years
    No problem, they should give an example with namespaces in their documentation for find and findall.
  • jorrebor
    jorrebor over 10 years
    well, they could have made this more clear in the documentation... thanks!
  • Hugo Koopmans
    Hugo Koopmans about 10 years
    i just moved from xml to lxml and wooo what a difference in speed... lxml is way faster and handles namespaces better.
  • Florent Roques
    Florent Roques over 3 years
    had to use from lxml import etree as ET