FormRequest Scrapy
13,153
Solution 1
Try using the FormRequest.from_response function
import scrapy
class LoginSpider(scrapy.Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']
def parse(self, response):
return scrapy.FormRequest.from_response(
response,
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login
)
def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.logger.error("Login failed")
return
Solution 2
Additionally to answer @Uday question, if you have multiple form on a page, use formid or formname to select the right form:
def parse(self, response):
return scrapy.FormRequest.from_response(
response,
formid='form_id_of_the_form',
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login
)
Without, FormRequest takes the first form by default.
Author by
Admin
Updated on June 04, 2022Comments
-
Admin almost 2 years
I'm new to Scrapy and Python. I'm trying to use FormRequest from Scrapy example but seems that formdata parameter is not parsing the '[]' from "Air". Any ideas on a workaround for this? Here is the code:
import scrapy import re import json from scrapy.http import FormRequest class AirfareSpider(scrapy.Spider): name = 'airfare' start_urls = [ 'http://www.viajanet.com.br/busca/voos-resultados#/POA/MEX/RT/01-03-2017/15-03-2017/-/-/-/1/0/0/-/-/-/-' ] def parse(self, response): return [FormRequest(url='http://www.viajanet.com.br/busca/resources/api/AvailabilityStatusAsync', formdata={"Partner":{ "Token":"p0C6ezcSU8rS54+24+zypDumW+ZrLkekJQw76JKJVzWUSUeGHzltXDhUfEntPPLFLR3vJpP7u5CZZYauiwhshw==", "Key":"OsHQtrHdMZPme4ynIP4lcsMEhv0=", "Id":"52", "ConsolidatorSystemAccountId":"80", "TravelAgencySystemAccountId":"80", "Name":"B2C" }, "Air":[{ "Arrival":{ "Iata":"MEX", "Date":"2017-03-15T15:00:00.000Z" }, "Departure":{ "Iata":"POA", "Date":"2017-03-01T15:00:00.000Z" }, "InBoundTime":"0", "OutBoundTime":"0", "CiaCodeList":"[]", "BookingClass":"-1", "IsRoundTrip":"true", "Stops":"-1", "FareType":"-" }], "Pax":{ "adt":"1", "chd":"0", "inf":"0" }, "DisplayTotalAmount":"false", "GetDeepLink":"false", "GetPriceMatrixOnly":"false", "PageLength":"10", "PageNumber":"2" } , callback=self.parse_airfare)] def parse_airfare(self, response): data = json.loads(response.body)
-
Zeugma over 7 yearsWhile this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From Review
-
Noah.Kim over 7 yearsThank you for your kind.
-
Lhassan Baazzi over 6 yearsWhen scraping with Scrapy framework and you have a form in webpage, always use the
FormRequest.from_response
function to submit the form, and use the FormRequest to send AJAX Requests data. -
Uday Posia about 3 yearsWhat should I do if there are multiple form on that page and all of them don't have any id or name attribute? How would I select particular form for Form.request ?
-
Leonardo Maffei over 2 yearsthis answer surely deserves mode upvotes. It is specially useful when you have multiple forms of login. In my case, it helped me with gitlab