Python beautiful soup form input parsing
Solution 1
You cannot submit a form with BeautifulSoup
, but here's how you can get the list of name,value pairs:
print [(element['name'], element['value']) for element in html_proc.find_all('input')]
prints:
[('qw1NWJOJi/E8IyqHSHA==', 'gDcZHY+nV'),
('sfqwWJOJi/E8DFDHSHB==', 'kgDcZHY+n'),
('Jsfqw1NdddfDDSDKKSL==', 'rNg4pUhnV')]
Solution 2
d = {e['name']: e.get('value', '') for e in html_proc.find_all('input', {'name': True})}
print(d)
prints:
{'sfqwWJOJi/E8DFDHSHB==': 'kgDcZHY+n',
'qw1NWJOJi/E8IyqHSHA==': 'gDcZHY+nV',
'Jsfqw1NdddfDDSDKKSL==': 'rNg4pUhnV'}
Building on @alecxe, this avoids KeyErrors, and parses the form into a dictionary, more ready for requests.
url = 'http://example.com/' + html_proc.form['action']
requests.post(url , data=d)
Though if this gets any more complicated (cookies, scripts) you might want to Mechanize.
The reason for the TypeError is confusion over the first parameter to find() being 'name'. Instead html_proc.find("input", attrs={'name': True})
. Also for the attrs parameter, instead of the set {'value'} use the dictionary {'value': True}
.
sarasimple
Updated on July 21, 2022Comments
-
sarasimple almost 2 years
My goal is to grab a list of all input names and values. To pair them up and submit the form. The names and values are randomised.
from bs4 import BeautifulSoup # parsing html = """ <html> <head id="Head1"><title>Title Page</title></head> <body> <form id="formS" action="login.asp?dx=" method="post"> <input type=hidden name=qw1NWJOJi/E8IyqHSHA== value='gDcZHY+nV' > <input type=hidden name=sfqwWJOJi/E8DFDHSHB== value='kgDcZHY+n' > <input type=hidden name=Jsfqw1NdddfDDSDKKSL== value='rNg4pUhnV' > </form> </body> </html> """ html_proc = BeautifulSoup(html)
This bit works fine:
print html_proc.find("input", value=True)["value"] > gDcZHY+nV
However the following statements don't work or don't work as hoped:
print html_proc.find("input", name=True)["name"] > TypeError: find() got multiple values for keyword argument 'name' print html_proc.findAll("input", value=True, attrs={'value'}) > [] print html_proc.findAll('input', value=True) > <input name="qw1NWJOJi/E8IyqHSHA==" type="hidden" value="gDcZHY+nV"> > <input name="sfqwWJOJi/E8DFDHSHB==" type="hidden" value="kgDcZHY+n"> > <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4pUhnV"> > </input></input></input>, <input name="sfqwWJOJi/E8DFDHSHB==" type="hidden" > value="kgDcZHY+n"> > <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4pUhnV"> > </input></input>, <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4p > UhnV"></input>
-
sarasimple about 10 yearsThank you. Brilliant. Parsimonious solutions to code are intoxicating.
-
sarasimple about 10 yearsWhen my reputation is higher I'll come back and rate your answer up.
-
alecxe about 10 years@sarasimple thanks, but don't worry about it, just glad it helped. Happy web-scraping!
-
An old man in the sea. over 2 yearsalcxe, what if I need to submit a form? What would you recommend?