why HTML Agility Pack HtmlDocument.DocumentNode is null?
12,468
This works for me.
using(WebClient client = new WebClient())
{
client.Encoding = System.Text.Encoding.UTF8;
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(client.DownloadString("http://www.google.com?q=stackoverflow"));
foreach (var href in doc.DocumentNode.Descendants("a").Select(x => x.Attributes["href"]))
{
if (href == null) continue;
href.Value = "http://ahmadalli.somee.com/default.aspx?url=" + HttpUtility.UrlEncode(href.Value);
}
StringWriter writer = new StringWriter();
doc.Save(writer);
var finalHtml = writer.ToString();
}
Also see the HttpUtility.UrlEncode
to be able to get the url back correctly. Otherwise, some parameters in original url may cause problem.
Use HttpUtility.UrlDecode
to decode it.
Related videos on Youtube
Author by
ahmadali shafiee
I Love Programming A Lot!!! but now it is not correct! computer is my life. I do every thing using computer. but programming is the must used tool in computer. Started with Pascal and now is C#.
Updated on June 04, 2022Comments
-
ahmadali shafiee about 2 years
I'm using this code to change the href attribute of a HTML stream.
first I download a full html page using this code:(URL is webpage address)
HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(URL); HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse(); Stream s = myHttpWebResponse.GetResponseStream();
then I process this:
HtmlDocument doc = new HtmlDocument(); doc.Load(s); foreach (HtmlNode link in doc.DocumentNode.SelectNodes("/a")) { string att = link.Attributes["href"].Value; link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att; } doc.Save(s);
s
is html stream.but I've got an exception that says
doc.DocumentNode
is null!i tried many sites but
doc.DocumentNode
is null to-
Mike Park over 12 yearsWhat does
s
look like?
-
-
ahmadali shafiee over 12 yearsanyway doc.DocumentNode is null and the exception thrown before of the foreach
-
Oded over 12 yearsWhy do you feel the need to use
DocumentNode
at all? Why not usedoc.SelectNodes
directly? -
ahmadali shafiee over 12 years@Oded: SelectNodes is a method of DocumentNode
-
Cristian Lupascu over 12 years@ahmadalishafiee Can you give an example of URL for which this fails? Or just a basic structure of the HTML?
-
ahmadali shafiee over 12 years@w0lf: I tried your suggestion but after that I just got a empty Stream and my new Stream was null
-
Cristian Lupascu over 12 years@ahmadalishafiee Try using
http://google.com
as the URL instead of justgoogle.com
(if you didn't do that already) -
Cristian Lupascu over 12 years@ahmadalishafiee I've added a code sample that works (enumerates all links on the page)
-
ahmadali shafiee over 12 years@w0lf: first: I'm using http but stackoverflow comment doesn't show it. second: I tried your code but still an empty html like last try
-
Cristian Lupascu over 12 years@ahmadalishafiee Try using a different URL or try to diagnose your network connection (see if you need to configure a proxy)
-
ahmadali shafiee over 12 years@w0lf: It's a asp.net and runs into my browser and I can connect to stackoverflow. the problem isn't from network connection
-
ahmadali shafiee over 12 years@w0lf: there is no proxy or internet problems in my connection.
-
ahmadali shafiee over 12 yearsI tried your code but I've got an exception:
Object reference not set to an instance of an object.
-
L.B over 12 years@ahmadalishafiee I run it on another machine. It worked without a problem.
-
ahmadali shafiee over 12 years
-
L.B over 12 yearsI added
if (href == null) continue;
into the loop. -
Sam over 11 yearsThere is no difference in resulting strings between
"/a"
and@"/a"
. -
Phill Healey over 10 years@ahmadalishafiee It's probably worth noting that Google will / does put temporary blocks on IP's if there are an above normal number of calls. This, might be causing you to get nothing returned from goog if you've been running the code a lot in a short space of time. You should probably try using some other urls for now, so as to eliminate that as a factor.