Python Xpath: lxml.etree.XPathEvalError: Invalid predicate

16,610
print(tree.xpath('//div[@class="center-stack"]//*/a[@class="name"]/@href'))

You were missing a closing ] after "center-stack".

You can also just pull the a[@class="name"] tags from div[@class="content"]

 tree.xpath('//div[@class="content"]//a[@class="name"]/@href')

Both will give you the hrefs you want:

In [19]: import  requests

In [20]: from lxml.html import fromstring


In [21]: r = requests.get("https://itunes.apple.com/us/app/candy-crush-saga/id553834731")

In [22]: tree = fromstring(r.content)

In [23]: a = tree.xpath('//div[@class="content"]//a[@class="name"]/@href')

In [24]: b =  tree.xpath('//div[@class="center-stack"]//*/a[@class="name"]/@href')

In [25]: print(a == b)
True

In [26]: print(a)
['https://itunes.apple.com/us/app/word-search-puzzles/id609067187?mt=8', 'https://itunes.apple.com/us/app/cookie-jam/id727296976?mt=8', 'https://itunes.apple.com/us/app/jewel-mania/id561326449?mt=8', 'https://itunes.apple.com/us/app/jelly-splash/id645949180?mt=8', 'https://itunes.apple.com/us/app/bubble-island/id531354582?mt=8']

In [27]: print(b)
['https://itunes.apple.com/us/app/word-search-puzzles/id609067187?mt=8', 'https://itunes.apple.com/us/app/cookie-jam/id727296976?mt=8', 'https://itunes.apple.com/us/app/jewel-mania/id561326449?mt=8', 'https://itunes.apple.com/us/app/jelly-splash/id645949180?mt=8', 'https://itunes.apple.com/us/app/bubble-island/id531354582?mt=8']
Share:
16,610
Michael Martinez
Author by

Michael Martinez

Wannabe full-stack dev, entrepeneur and all-round god-among-mortals... Actual BI reporting analyst, corporate sell out and extremely human.

Updated on June 14, 2022

Comments

  • Michael Martinez
    Michael Martinez almost 2 years

    I'm trying to learn how to scrape web pages and in the tutorial I'm using the code below is throwing this error:

    lxml.etree.XPathEvalError: Invalid predicate
    

    The website I'm querying is (don't judge me, it was the one used in the training vid :/ ): https://itunes.apple.com/us/app/candy-crush-saga/id553834731

    The xpath string that causes the error is here:

    links = tree.xpath('//div[@class="center-stack"//*/a[@class="name"]/@href')
    

    I'm using the LXML and requests libraries.

    If you need any additional info I'm happy to provide!

  • Michael Martinez
    Michael Martinez about 8 years
    Thanks for your help Padraic. You and @Splash58 noticed that I missed the closing "]" tag the center-stack div.
  • Padraic Cunningham
    Padraic Cunningham about 8 years
    @MichaelMartinez, no worries, you only want those five links, yes?
  • Michael Martinez
    Michael Martinez about 8 years
    Later on in the tutorial they talk you through how to crawl through those links and get the same info from those pages too, but I haven't got to that video yet. So that's all I need at the mo. Thanks so much for your help. Such a lame error to miss!
  • Padraic Cunningham
    Padraic Cunningham about 8 years
    @MichaelMartinez, no prob, I have done it myself plenty times.