Getting HtmlDocument from string without using browser control
16,912
Solution 1
You can use HtmlAgilityPack .... For example:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var results = doc.DocumentNode
.Descendants("div")
.Select(n => n.InnerText);
Solution 2
I know this is an old post but my repl is for others who come here like me
If you want to do it using code .NET here is what you have to do
public System.Windows.Forms.HtmlDocument GetHtmlDocument(string html)
{
WebBrowser browser = new WebBrowser();
browser.ScriptErrorsSuppressed = true;
browser.DocumentText = html;
browser.Document.OpenNew(true);
browser.Document.Write(html);
browser.Refresh();
return browser.Document;
}
Author by
Aabela
Updated on June 05, 2022Comments
-
Aabela almost 2 years
I obtain a webpage's html code (as a string) using a WebClient.
However I want to turn it into an HtmlDocument object so I can use the DOM features this class offers. Currently the only way I know how to do it - is using a Browser control as follows:
string pageHtml = client.DownloadString(url); browser.ScriptErrorsSuppressed = true; browser.DocumentText = pageHtml; do { Application.DoEvents(); } while (browser.ReadyState != WebBrowserReadyState.Complete); return browser.Document;
Is there another way of doing it? I know there are other browser controls avaliable, but is there a simpler way?