Scrapy: extracting data from an html tag that uses an "id" Selector instead of a "class"
13,212
Solution 1
This is one way.
>>> HTML = '''
... <span id="id_A">Hello, Earth</span>
... <span id="id_B">Hello, Universe</span>
... '''
>>> from scrapy.selector import Selector
>>> selector = Selector(text=HTML)
>>> selector.css('[id="id_A"]::text').extract()
['Hello, Earth']
Alternatively,
>>> HTML = '''
... <span id="id_A">Hello, Earth</span>
... <span id="id_B">Hello, Universe</span>
... '''
>>> from scrapy.selector import Selector
>>> selector = Selector(text=HTML)
>>> selector.css('span#id_A::text').extract()
['Hello, Earth']
Scrapy uses cssselect which follows W3 Selectors Level 3
Solution 2
The problem is that you're using a "class selector" (please check this for reference). You should really use an "id selector", this should work:
response.css('#id_A::text').extract()
Author by
RF_956
Updated on June 09, 2022Comments
-
RF_956 almost 2 years
I am new to web scraping and Scrapy. I hope you can help me.
I am trying to extract data from a web page where it uses tag. Usually, if the span tag is using a class, for example:
<span class="class_A>Hello, World!</span>
I would use the following code to retrieve the text.
request.css('span.class_A::text').extract()
However, when an html is now using an "id" instead of a "class", for example,
<span id="id_A>Hello, Universe!</span>
the code below does not work anymore.
request.css('span.id_A::text').extract()
Please help! What's the correct way of extracting data using an "id".
Thank you for your help!