Passing a argument to a callback function
Solution 1
This is what you'd use the meta
Keyword for.
def parse(self, response):
for sel in response.xpath('//tbody/tr'):
item = HeroItem()
# Item assignment here
url = 'https://' + item['server'] + '.battle.net/' + sel.xpath('td[@class="cell-BattleTag"]//a/@href').extract()[0].strip()
yield Request(url, callback=self.parse_profile, meta={'hero_item': item})
def parse_profile(self, response):
item = response.meta.get('hero_item')
item['weapon'] = response.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4]
yield item
Also note, doing sel = Selector(response)
is a waste of resources and differs from what you did earlier, so I changed it. It's automatically mapped in the response
as response.selector
, which also has the convenience shortcut of response.xpath
.
Solution 2
Here's a better way to pass args to callback function:
def parse(self, response):
request = scrapy.Request('http://www.example.com/index.html',
callback=self.parse_page2,
cb_kwargs=dict(main_url=response.url))
request.cb_kwargs['foo'] = 'bar' # add more arguments for the callback
yield request
def parse_page2(self, response, main_url, foo):
yield dict(
main_url=main_url,
other_url=response.url,
foo=foo,
)
![Admin](/assets/logo_square_200-5d0d61d6853298bd2a4fe063103715b4daf2819fc21225efa21dfb93e61952ea.png)
Admin
Updated on June 26, 2022Comments
-
Admin about 2 years
def parse(self, response): for sel in response.xpath('//tbody/tr'): item = HeroItem() item['hclass'] = response.request.url.split("/")[8].split('-')[-1] item['server'] = response.request.url.split('/')[2].split('.')[0] item['hardcore'] = len(response.request.url.split("/")[8].split('-')) == 3 item['seasonal'] = response.request.url.split("/")[6] == 'season' item['rank'] = sel.xpath('td[@class="cell-Rank"]/text()').extract()[0].strip() item['battle_tag'] = sel.xpath('td[@class="cell-BattleTag"]//a/text()').extract()[1].strip() item['grift'] = sel.xpath('td[@class="cell-RiftLevel"]/text()').extract()[0].strip() item['time'] = sel.xpath('td[@class="cell-RiftTime"]/text()').extract()[0].strip() item['date'] = sel.xpath('td[@class="cell-RiftTime"]/text()').extract()[0].strip() url = 'https://' + item['server'] + '.battle.net/' + sel.xpath('td[@class="cell-BattleTag"]//a/@href').extract()[0].strip() yield Request(url, callback=self.parse_profile) def parse_profile(self, response): sel = Selector(response) item = HeroItem() item['weapon'] = sel.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4] return item
Well, I'm scraping a whole table in the main parse method and I have taken several fields from that table. One of these fields is an url and I want to explore it to get a whole new bunch of fields. How can I pass my already created ITEM object to the callback function so the final item keeps all the fields?
As it is shown in the code above, I'm able to save the fields inside the url (code at the moment) or only the ones in the table (simply write
yield item
) but I can't yield only one object with all the fields together.I have tried this, but obviously, it doesn't work.
yield Request(url, callback=self.parse_profile(item)) def parse_profile(self, response, item): sel = Selector(response) item['weapon'] = sel.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4] return item