How to get all input elements in a form with HtmlAgilityPack without getting a null reference error

24,902

You can do the following:

HtmlNode.ElementsFlags.Remove("form");

HtmlDocument doc = new HtmlDocument();

doc.Load(@"D:\test.html");

HtmlNode secondForm = doc.GetElementbyId("form2");

foreach (HtmlNode node in secondForm.Elements("input"))
{
    HtmlAttribute valueAttribute = node.Attributes["value"];

    if (valueAttribute != null)
    {
        Console.WriteLine(valueAttribute.Value);
    }
}

By default HTML Agility Pack parses forms as empty node because they are allowed to overlap other HTML elements. The first line, (HtmlNode.ElementsFlags.Remove("form");) disables this behavior allowing you to get the input elements inside the second form.

Update: Example of form elements overlap:

<table>
<form>
<!-- Other elements -->
</table>
</form>

The element begins inside a table but is closed outside the table element. This is allowed in the HTML specification and HTML Agility Pack has to deal with it.

Share:
24,902
Bill Li
Author by

Bill Li

Updated on September 26, 2020

Comments

  • Bill Li
    Bill Li over 3 years

    Example HTML:

     <html><body>
         <form id="form1">
           <input name="foo1" value="bar1" />
           <!-- Other elements -->
         </form>
         <form id="form2">
           <input name="foo2" value="bar2" />
           <!-- Other elements -->
         </form>   
     </body></html>
    

    Test code:

    HtmlDocument doc = new HtmlDocument();
    doc.Load(@"D:\test.html");
    foreach (HtmlNode node in doc.GetElementbyId("form2").SelectNodes(".//input"))
    {
        Console.WriteLine(node.Attributes["value"].Value);            
    }
    

    The statement doc.GetElementbyId("form2").SelectNodes(".//input") gives me a null reference.

    Anything I did wrong? thanks.