newbie in scrapy : how to response.css scrape the text part?

10,332

Here are two options, one using css(), another one using xpath():

>>> response.css("td.c1 > span::text").extract()
[u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'10', u'11', u'12', u'13', u'14', u'15', u'16', u'17', u'18', u'19', u'20']
>>> response.xpath("//td[@class='c1']/span/text()").extract()
[u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'10', u'11', u'12', u'13', u'14', u'15', u'16', u'17', u'18', u'19', u'20']
Share:
10,332
user2492364
Author by

user2492364

Updated on June 04, 2022

Comments

  • user2492364
    user2492364 almost 2 years

    When I practice, I want to catch only the text part (1,2,3,4,5...), without the part how can I write the response.css("td[class='c1']")?

    scrapy shell "https://tw.movies.yahoo.com/chart.html"
    response.css("td[class='c1']")
    

    enter image description here

  • user2492364
    user2492364 almost 10 years
    Wow it's amazing!!But is there a way to remove the 'u'?
  • alecxe
    alecxe almost 10 years
    @user2492364 it is just a unicode literal, don't worry about it.
  • Eefret
    Eefret over 8 years
    you can remove the u by doing after all that .encode('utf8')
  • Aakash Saxena
    Aakash Saxena over 5 years
    In your code - response.css("td[class='c1']"), you can add ::text before the quotes: response.css("td[class='c1']::text")