Sending an ASP.net POST with Python's Requests
You have too many request parameters, and should not set the content-type, content-length, host, origin, or connection headers; leave those to requests
to set.
You are also doubling up the url parameters; either add the vr
parameter to the URL manually or use params
, not do both.
It may well be that some of the parameters in the POST body are generated by the ASP application tied to a session. I'd use a GET request with a Session object the valuation_url
, parse the form in that page to extract the __CALLBACKID
parameter. The requests Session will then store any cookies the server sets and reuse those:
item_request_headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36",
"Accept": "*/*",
"Accept-Encoding": "gzip,deflate,sdch",
"Accept-Language": "en-US,en;q=0.8"
}
payload = {"vr": int(item_number[0])}
session = requests.Session(headers=item_request_headers)
# Get form page
form_response = session.get(validation_url, params=payload)
# parse form page; BeautifulSoup could do this for example
soup = BeautifulSoup(form_response.content)
callbackid = soup.select('input[name=__CALLBACKID]')[0]['value']
item_request_body = {
"__SPSCEditMenu": "true",
"MSOWebPartPage_PostbackSource": "",
"MSOTlPn_SelectedWpId": "",
"MSOTlPn_View": 0,
"MSOTlPn_ShowSettings": "False",
"MSOGallery_SelectedLibrary": "",
"MSOGallery_FilterString": "",
"MSOTlPn_Button": "none",
"__EVENTTARGET": "",
"__EVENTARGUMENT": "",
"MSOAuthoringConsole_FormContext": "",
"MSOAC_EditDuringWorkflow": "",
"MSOSPWebPartManager_DisplayModeName": "Browse",
"MSOWebPartPage_Shared": "",
"MSOLayout_LayoutChanges": "",
"MSOLayout_InDesignMode": "",
"MSOSPWebPartManager_OldDisplayModeName": "Browse",
"MSOSPWebPartManager_StartWebPartEditingName": "false",
"__VIEWSTATE": viewstate,
"keywords": "Search our site",
"__CALLBACKID": callbackid,
"__CALLBACKPARAM": "startvr"
}
item_url = 'http://www.example.com/EN/items/Pages/yourrates.aspx'
response = session.post(url=item_url, params=payload, data=item_request_body,
headers={'Referer': form_response.url})
The session handles the headers (setting a user agent, and accept parameters), only on the POST with the session do we add a referrer header as well.
David K.
Software engineer living and working in San Francisco. During the day, I mostly work on algorithms, data structures, and other topics in backend software development with an awesome group of people at an early-stage company. I enjoy building diverse products, and have experience in human-computer interaction design, low-level software, venture capital investing, public speaking, musical performance, and grassroots community service. Love products and relish a good challenge. Thanks for stopping by!
Updated on June 04, 2022Comments
-
David K. almost 2 years
I'm scraping an old ASP.net website using Python's requests module.
I've spent 5+ hours trying to figure out how to simulate this POST request to no avail. Doing it the way I do it below, I essentially get a message saying "No item matches this item reference."
Any help would be deeply appreciated – here's the request and my code, a few things are modified out of respect to brevity and/or privacy:
My own code:
import requests # Scraping the item number from the website, I have confirmed this is working. #Then use the newly acquired item number to request the data. item_url = http://www.example.com/EN/items/Pages/yourrates.aspx?vr= + item_number[0] viewstate = r'/wEPD...' # Truncated for brevity. # Create the appropriate request and payload. payload = {"vr": int(item_number[0])} item_request_body = { "__SPSCEditMenu": "true", "MSOWebPartPage_PostbackSource": "", "MSOTlPn_SelectedWpId": "", "MSOTlPn_View": 0, "MSOTlPn_ShowSettings": "False", "MSOGallery_SelectedLibrary": "", "MSOGallery_FilterString": "", "MSOTlPn_Button": "none", "__EVENTTARGET": "", "__EVENTARGUMENT": "", "MSOAuthoringConsole_FormContext": "", "MSOAC_EditDuringWorkflow": "", "MSOSPWebPartManager_DisplayModeName": "Browse", "MSOWebPartPage_Shared": "", "MSOLayout_LayoutChanges": "", "MSOLayout_InDesignMode": "", "MSOSPWebPartManager_OldDisplayModeName": "Browse", "MSOSPWebPartManager_StartWebPartEditingName": "false", "__VIEWSTATE": viewstate, "keywords": "Search our site", "__CALLBACKID": "ctl00$SPWebPartManager1$g_dbb9e9c7_fe1d_46df_8789_99a6c9db4b22", "__CALLBACKPARAM": "startvr" } # Write the appropriate headers for the property information. item_request_headers = { "Host": home_site, "Connection": "keep-alive", "Content-Length": len(encoded_valuation_request), "Cache-Control": "max-age=0", "Origin": home_site, "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36", "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8", "Cookie": "__utma=48409910.1174413745.1405662151.1406402487.1406407024.17; __utmb=48409910.7.10.1406407024; __utmc=48409910; __utmz=48409910.1406178827.13.3.utmcsr=ratesandvallandingpage|utmccn=landingpages|utmcmd=button", "Accept": "*/*", "Referer": valuation_url, "Accept-Encoding": "gzip,deflate,sdch", "Accept-Language": "en-US,en;q=0.8" } response = requests.post(url=item_url, params=payload, data=item_request_body, headers=item_request_headers) print response.text
What Chrome is telling me the request looks like:
Remote Address:202.55.96.131:80 Request URL:http://www.example.com/EN/items/Pages/yourrates.aspx?vr=123456789 Request Method:POST Status Code:200 OK Request Headers Accept:*/* Accept-Encoding:gzip,deflate,sdch Accept-Language:en-US,en;q=0.8 Cache-Control:max-age=0 Connection:keep-alive Content-Length:21501 Content-Type:application/x-www-form-urlencoded; charset=UTF-8 Cookie:__utma=48409910.1174413745.1405662151.1406402487.1406407024.17; __utmb=48409910.7.10.1406407024; __utmc=48409910; __utmz=48409910.1406178827.13.3.utmcsr=ratesandvallandingpage|utmccn=landingpages|utmcmd=button Host:www.site.com Origin:www.site.com Referer:http://www.example.com/EN/items/Pages/yourrates.aspx?vr=123456789 User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36 Query String Parameters vr:123456789 Form Data __SPSCEditMenu:true MSOWebPartPage_PostbackSource: MSOTlPn_SelectedWpId: MSOTlPn_View:0 MSOTlPn_ShowSettings:False MSOGallery_SelectedLibrary: MSOGallery_FilterString: MSOTlPn_Button:none __EVENTTARGET: __EVENTARGUMENT: MSOAuthoringConsole_FormContext: MSOAC_EditDuringWorkflow: MSOSPWebPartManager_DisplayModeName:Browse MSOWebPartPage_Shared: MSOLayout_LayoutChanges: MSOLayout_InDesignMode: MSOSPWebPartManager_OldDisplayModeName:Browse MSOSPWebPartManager_StartWebPartEditingName:false __VIEWSTATE:/wEPD...(Omitted for length) keywords:Search our site __CALLBACKID:ctl00$SPWebPartManager1$g_dbb9e9c7_fe1d_46df_8789_99a6c9db4b22 __CALLBACKPARAM:startvr
-
David K. almost 10 yearsSuper helpful, Martijn, thank you! I'm still working through things, but as soon as I finish implementing and testing the solution I will be sure to confirm :)
-
David K. almost 10 yearsAlso, do you know the way in which I would encode this type of thing?
__CALLBACKID=ctl00%24SPWebPartManager1%24g_dbb9e9c7_fe1d_46df_8789_99a6c9db4b22
It gave me an error in the callback, presumably due to the unusual percent signs. -
Martijn Pieters almost 10 yearsInclude the decoded value; leave encoding to
requests
.%24
is an encoded$
for example. -
David K. almost 10 yearsYou are so helpful Martijn! Thanks again! Works well, now I just need to automate retrieving a few of the bits of information via BeautifulSoup.
-
Martijn Pieters almost 10 years@DavidK.: you can combine
requests
and BeautifulSoup withrobobrowser
; it'll help you fill in the forms too. -
David K. almost 10 yearsGood idea, but I'm trying to do it purely with HTTP requests, no browser layer. I need maximal speed for a sizable dataset, and none of the browsers I've tried are fast enough. However, if you've had good experiences with robobrowser I may have to try it!
-
Martijn Pieters almost 10 years@DavidK.:
robobrowser
is not a browser layer. It isrequests
plus BeautifulSoup plus a little glue to handle forms. -
David K. almost 10 yearsAh, well that sounds even better then! I'm just using bs4 and requests myself at the moment, but that may be a convenient package.