HTMLAgilityPack iterate all text nodes only

11,916

Solution 1

Something like this:

    HtmlDocument doc = new HtmlDocument();
    doc.Load(yourHtmlFile);

    foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']"))
    {
        Console.WriteLine(node.InnerText.Trim());
    }

Will output this:

Select your Age:
0 to 10
20 and above
Help/Hints:
This is required field.
Make sure select the right age.
Learn More

Solution 2

I tested @Simon Mourier's answer on the Google home page and got lots of CSS and Javascript, so I added an extra filter to remove it:

    public string getBodyText(string html)
    {
        string str = "";

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(html);

        try
        {
            // Remove script & style nodes
            doc.DocumentNode.Descendants().Where( n => n.Name == "script" || n.Name == "style" ).ToList().ForEach(n => n.Remove());

            // Simon Mourier's Answer
            foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']"))
            {
                str += node.InnerText.Trim() + " ";
            }
        }
        catch (Exception)
        {
        }

        return str;
    }
Share:
11,916
Sha Le
Author by

Sha Le

Updated on June 27, 2022

Comments

  • Sha Le
    Sha Le almost 2 years

    Here is a HTML snippet and all I want is to get only the text nodes and iterate them. Pls let me know. Thanks.

    <div>
       <div>
          Select your Age:
          <select>
              <option>0 to 10</option>
              <option>20 and above</option>
          </select>
       </div>
       <div>
           Help/Hints:
           <ul>
              <li>This is required field.
              <li>Make sure select the right age.
           </ul>
          <a href="#">Learn More</a>
       </div>
    </div>
    

    Result:

    1. Select your Age:
    2. 0 to 10
    3. 20 and above
    4. Help/Hints:
    5. This is required field.
    6. Make sure select the right age.
    7. Learn More
  • 8oris
    8oris about 2 years
    Trying to implement your code, i got an "BC30491: Expression does not produce a value" error on n.Remove()