Does .NET framework offer methods to parse an HTML string?
Solution 1
You can create a dummy html document.
WebBrowser w = new WebBrowser();
w.Navigate(String.Empty);
HtmlDocument doc = w.Document;
doc.Write("<html><head></head><body><img id=\"myImage\" src=\"c:\"/><a id=\"myLink\" href=\"myUrl\"/></body></html>");
Console.WriteLine(doc.Body.Children.Count);
Console.WriteLine(doc.GetElementById("myImage").GetAttribute("src"));
Console.WriteLine(doc.GetElementById("myLink").GetAttribute("href"));
Console.ReadKey();
Output:
2
file:///c:
about:myUrl
Editing elements:
HtmlElement imageElement = doc.GetElementById("myImage");
string newSource = "d:";
imageElement.OuterHtml = imageElement.OuterHtml.Replace(
"src=\"c:\"",
"src=\"" + newSource + "\"");
Console.WriteLine(doc.GetElementById("myImage").GetAttribute("src"));
Output:
file:///d:
Solution 2
Assuming you're dealing with well formed HTML, you could simply treat the text as an XML document. The framework is loaded with features to do exactly what you're asking.
http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx
Solution 3
Aside from the HTML Agility Pack, and porting HtmlUnit over to C#, what sounds like solid solutions are:
- Most obviously - use regex. (System.Text.RegularExpressions)
- Using an XML Parser. (because HTML is a system of tags treat it like an XML document?)
- Linq?
One thing I do know is that parsing HTML like XML may cause you to run into a few problems. XML and HTML are not the same. Read about it: here
Also, here is a post about Linq vs Regex.
Jelly Ama
Updated on June 22, 2022Comments
-
Jelly Ama almost 2 years
Knowing that I can't use HTMLAgilityPack, only straight .NET, say I have a string that contains some HTML that I need to parse and edit in such ways:
- find specific controls in the hierarchy by id or by tag
- modify (and ideally create) attributes of those found elements
Are there methods available in .net to do so?
-
porges about 12 yearsThis requires you to load up the document in a Winforms control.
-
Jelly Ama about 12 yearsCorrect me if I'm wrong but this requires a webBrowser control and doesn't allow for direct HTML string parsing.
-
Alexei Levenkov about 12 years@JellyAma, yes, but isn't it what you seem to want in "modify (and ideally create) attributes of those found elements"?
-
Jelly Ama about 12 years@Alexei, most importantly, I need to parse strings of HTML.
-
L.B about 12 yearsTry to parse this well formed html.
<html><body>line1
<br>line2</body></html>
-
L.B about 12 years