HTML Agility Pack

14,405

The usage is very similar to XmlDocument; you could use MSDN on XmlDocument for a broad overview; you might also want to learn xpath syntax (MSDN).

Example:

HtmlDocument doc = new HtmlDocument();
doc.Load(path); // or .LoadHtml(html);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//meta[@name='description']");
if (node != null) {
    string desc = node.GetAttributeValue("content", "");
    // TODO: write desc somewhere
}

The second argument to GetAttributeValue is the default returned in case the attribute is not found.

Share:
14,405
jay_t55
Author by

jay_t55

Updated on June 04, 2022

Comments

  • jay_t55
    jay_t55 almost 2 years

    I'm trying to use HTML Agility Pack to get the description text from inside the:

    <meta name="description" content="**this is the text i want to extract and store in a string**" />
    

    And someone on Stackoverflow a little while ago suggested I use HTMLAgilityPack. But I don't know how to use it, and the documentation for it that I've found (including the docs contained in the downloads) all have invalid links and therefor cannot view the documentation.

    Can somebody please help me solve this?

  • Alex
    Alex over 13 years
    What is the second string argument (the empty one) used for in node.GetAttributeValue("content", "");?
  • Marc Gravell
    Marc Gravell over 13 years
    @AlexW - I don't have that library "to hand" at the moment; what is the parameter called?
  • Alex
    Alex over 13 years
    Not sure on the parameter name... Will follow definition path later to find out. Thanks for answer here, v useful.
  • Brian
    Brian over 10 years
    "def" stands for default. It is the value to return if the attribute is not found. Commenting here because this is the top result when googling the answer.