how do I use empty namespaces in an lxml xpath query?

15,547

Solution 1

Something like this should work:

import lxml.etree as et

ns = {"atom": "http://www.w3.org/2005/Atom"}
tree = et.fromstring(xml)
for node in tree.xpath('//atom:entry', namespaces=ns):
    print node

See also http://lxml.de/xpathxslt.html#namespaces-and-prefixes.

Alternative:

for node in tree.xpath("//*[local-name() = 'entry']"):
    print node

Solution 2

Use findall method.

for item in tree.findall('{http://www.w3.org/2005/Atom}entry'): 
    print item
Share:
15,547
ewok
Author by

ewok

Software engineer in the Greater Boston Area. Primary areas of expertise include Java, Python, web-dev, and general OOP, though I have dabbled in many other technologies.

Updated on June 23, 2022

Comments

  • ewok
    ewok almost 2 years

    I have an xml document in the following format:

    <feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:gsa="http://schemas.google.com/gsa/2007">
      ...
      <entry>
        <id>https://ip.ad.dr.ess:8000/feeds/diagnostics/smb://ip.ad.dr.ess/path/to/file</id>
        <updated>2011-11-07T21:32:39.795Z</updated>
        <app:edited xmlns:app="http://purl.org/atom/app#">2011-11-07T21:32:39.795Z</app:edited>
        <link rel="self" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics"/>
        <link rel="edit" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics"/>
        <gsa:content name="entryID">smb://ip.ad.dr.ess/path/to/directory</gsa:content>
        <gsa:content name="numCrawledURLs">7</gsa:content>
        <gsa:content name="numExcludedURLs">0</gsa:content>
        <gsa:content name="type">DirectoryContentData</gsa:content>
        <gsa:content name="numRetrievalErrors">0</gsa:content>
      </entry>
      <entry>
        ...
      </entry>
      ...
    </feed>
    

    I need to retrieve all entry elements using xpath in lxml. My problem is that I can't figure out how to use an empty namespace. I have tried the following examples, but none work. Please advise.

    import lxml.etree as et
    
    tree=et.fromstring(xml)    
    

    The various things I have tried are:

    for node in tree.xpath('//entry'):
    

    or

    namespaces = {None:"http://www.w3.org/2005/Atom" ,"openSearch":"http://a9.com/-/spec/opensearchrss/1.0/" ,"gsa":"http://schemas.google.com/gsa/2007"}
    
    for node in tree.xpath('//entry', namespaces=ns):
    

    or

    for node in tree.xpath('//\"{http://www.w3.org/2005/Atom}entry\"'):
    

    At this point I just don't know what to try. Any help is greatly appreciated.