Python beautiful soup form input parsing

19,128

Solution 1

You cannot submit a form with BeautifulSoup, but here's how you can get the list of name,value pairs:

print [(element['name'], element['value']) for element in html_proc.find_all('input')]

prints:

[('qw1NWJOJi/E8IyqHSHA==', 'gDcZHY+nV'), 
 ('sfqwWJOJi/E8DFDHSHB==', 'kgDcZHY+n'), 
 ('Jsfqw1NdddfDDSDKKSL==', 'rNg4pUhnV')]

Solution 2

d = {e['name']: e.get('value', '') for e in html_proc.find_all('input', {'name': True})}
print(d)

prints:

{'sfqwWJOJi/E8DFDHSHB==': 'kgDcZHY+n', 
 'qw1NWJOJi/E8IyqHSHA==': 'gDcZHY+nV', 
 'Jsfqw1NdddfDDSDKKSL==': 'rNg4pUhnV'}

Building on @alecxe, this avoids KeyErrors, and parses the form into a dictionary, more ready for requests.

url = 'http://example.com/' + html_proc.form['action']
requests.post(url , data=d)

Though if this gets any more complicated (cookies, scripts) you might want to Mechanize.


The reason for the TypeError is confusion over the first parameter to find() being 'name'. Instead html_proc.find("input", attrs={'name': True}). Also for the attrs parameter, instead of the set {'value'} use the dictionary {'value': True}.

Share:
19,128
sarasimple
Author by

sarasimple

Updated on July 21, 2022

Comments

  • sarasimple
    sarasimple almost 2 years

    My goal is to grab a list of all input names and values. To pair them up and submit the form. The names and values are randomised.

    from bs4 import BeautifulSoup # parsing
    
    html = """
    <html>
    <head id="Head1"><title>Title Page</title></head>
    <body>
        <form id="formS" action="login.asp?dx=" method="post">
    
        <input type=hidden name=qw1NWJOJi/E8IyqHSHA== value='gDcZHY+nV' >
        <input type=hidden name=sfqwWJOJi/E8DFDHSHB== value='kgDcZHY+n' >
        <input type=hidden name=Jsfqw1NdddfDDSDKKSL== value='rNg4pUhnV' >
        </form>
    
    </body>
    
    </html>
    """
    
    html_proc = BeautifulSoup(html)
    

    This bit works fine:

    print html_proc.find("input", value=True)["value"]
    > gDcZHY+nV
    

    However the following statements don't work or don't work as hoped:

    print html_proc.find("input", name=True)["name"]
    > TypeError: find() got multiple values for keyword argument 'name'
    
    print html_proc.findAll("input", value=True, attrs={'value'})
    > []  
    
    print html_proc.findAll('input', value=True)
    > <input name="qw1NWJOJi/E8IyqHSHA==" type="hidden" value="gDcZHY+nV">
    > <input name="sfqwWJOJi/E8DFDHSHB==" type="hidden" value="kgDcZHY+n">
    > <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4pUhnV">
    > </input></input></input>, <input name="sfqwWJOJi/E8DFDHSHB==" type="hidden" 
    > value="kgDcZHY+n">
    > <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4pUhnV">
    > </input></input>, <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4p
    > UhnV"></input>
    
  • sarasimple
    sarasimple about 10 years
    Thank you. Brilliant. Parsimonious solutions to code are intoxicating.
  • sarasimple
    sarasimple about 10 years
    When my reputation is higher I'll come back and rate your answer up.
  • alecxe
    alecxe about 10 years
    @sarasimple thanks, but don't worry about it, just glad it helped. Happy web-scraping!
  • An old man in the sea.
    An old man in the sea. over 2 years
    alcxe, what if I need to submit a form? What would you recommend?