Scrapy grab div with multiple classes?

python html xpath web-scraping scrapy

11,539

Solution 1

You should consider using a CSS selector for this part of your query.

from scrapy import Selector
sel = Selector(text='<div class="product product-small">I am a product!</div>')
print sel.css('.product').extract()

If you need to, you can chain CSS and XPath selectors, as in the example on that page.

This could be also solved with xpath. You just needed to use contains():

//div[contains(concat(' ', normalize-space(@class), ' '), ' product ')]

Though, yes, the CSS selector option is more compact and readable.

11,539

Updated on June 05, 2022

user1835351 almost 2 years
I am trying to grab div's with the class: 'product'. The problem is, some of the div's with class 'product' also have the class 'product-small'. So when I use xpath('//div[@class='product']'), it only captures the divs with one class and not multiple. How can I do this with scrapy?

Example:
- Catches: <div class='product'>
- Doesn't catch: <div class='product product-small'>
Capi Etheriel almost 9 years

your xpath selector would also pick up elements with the not-a-product class.
alecxe almost 9 years

@barraponto yes, but the input to the current problem doesn't contain elements with not-a-product class. Thanks.
oschlueter almost 8 years

The selector has been edited to perform exact matching of class names (c.f. doc.scrapy.org/en/1.1/topics/…)
sherlock about 4 years

I have seen this doc,but I do not think it's useful,cause the content in div is dynamically-loaded,what can we do in this situation?