scrapy xpath select elements by classname

xpath web-scraping scrapy scrapy-shell

12,174

If you would check the response.body in the shell - you would see that it doesn't contain an element with class="zoomWindow":

In [3]: "zoomWindow" in response.body
Out[3]: False

But, if you open the page in the browser and inspect the HTML source, you would see that the element is there. This means that the page load involves javascript logic or additional AJAX requests. Scrapy is not a browser and doesn't have a javascript engine built-in. In other words, it only downloads the initial HTML code of the page without additionally downloading js and css files and "executing" them.

What you can try, for starters, is to use scrapyjs download handler and middleware.

To image you want to extract is also available in the img tag with id="PreviewImage":

In [4]: response.xpath("//img[@id='PreviewImage']/@src").extract()
Out[4]: [u'/WebRoot/NICEIC/Shops/NICShop/547F/0D9A/F434/5E4C/0759/0A0A/124C/58F7/5708.png']

12,174

Author by

secuaz

Updated on June 04, 2022

Comments

secuaz almost 2 years
I have followed How can I find an element by CSS class with XPath? which gives the selector to use for selecting elements by classname. The problem is when I use it it retrieves an empty result "[]" and I know by fact there is a div classed "zoomWindow" in the url fed to the scrapy shell.

My attempt:
```
scrapy shell "http://www.niceicdirect.com/epages/NICShop.sf/secAlIVFGjzzf2/?ObjectPath=/Shops/NICShop/Products/5696"
response.xpath("//*[contains(@class, 'zoomWindow')]")
```
I have looked at many resources that provide varied selectors. In my case the element only has one class, so versions that use "concat" I used but didn't work and discarded.

I have installed ubuntu and scrapy in a virtual machine just to make sure it was not a bug in my installation on windows but my attempt on ubuntu had the same results.

I don't know what else to try, can you see any typo in the selector?