why HTML Agility Pack HtmlDocument.DocumentNode is null?

12,468

This works for me.

using(WebClient client = new WebClient())
{
    client.Encoding = System.Text.Encoding.UTF8;
    var doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(client.DownloadString("http://www.google.com?q=stackoverflow"));
    foreach (var href in doc.DocumentNode.Descendants("a").Select(x => x.Attributes["href"]))
    {
        if (href == null) continue;
        href.Value = "http://ahmadalli.somee.com/default.aspx?url=" + HttpUtility.UrlEncode(href.Value);
    }
    StringWriter writer = new StringWriter();
    doc.Save(writer);
    var finalHtml = writer.ToString();
}

Also see the HttpUtility.UrlEncode to be able to get the url back correctly. Otherwise, some parameters in original url may cause problem.

Use HttpUtility.UrlDecode to decode it.

Share:
12,468

Related videos on Youtube

ahmadali shafiee
Author by

ahmadali shafiee

I Love Programming A Lot!!! but now it is not correct! computer is my life. I do every thing using computer. but programming is the must used tool in computer. Started with Pascal and now is C#.

Updated on June 04, 2022

Comments

  • ahmadali shafiee
    ahmadali shafiee about 2 years

    I'm using this code to change the href attribute of a HTML stream.

    first I download a full html page using this code:(URL is webpage address)

    HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(URL);
    HttpWebResponse myHttpWebResponse = 
                             (HttpWebResponse)myHttpWebRequest.GetResponse();
    
    Stream s = myHttpWebResponse.GetResponseStream();
    

    then I process this:

    HtmlDocument doc = new HtmlDocument();
    
    doc.Load(s);
    foreach (HtmlNode link in doc.DocumentNode.SelectNodes("/a"))
    {
        string att = link.Attributes["href"].Value;
        link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;
    }
    doc.Save(s);
    

    s is html stream.

    but I've got an exception that says doc.DocumentNode is null!

    i tried many sites but doc.DocumentNode is null to

    • Mike Park
      Mike Park over 12 years
      What does s look like?
  • ahmadali shafiee
    ahmadali shafiee over 12 years
    anyway doc.DocumentNode is null and the exception thrown before of the foreach
  • Oded
    Oded over 12 years
    Why do you feel the need to use DocumentNode at all? Why not use doc.SelectNodes directly?
  • ahmadali shafiee
    ahmadali shafiee over 12 years
    @Oded: SelectNodes is a method of DocumentNode
  • Cristian Lupascu
    Cristian Lupascu over 12 years
    @ahmadalishafiee Can you give an example of URL for which this fails? Or just a basic structure of the HTML?
  • ahmadali shafiee
    ahmadali shafiee over 12 years
    @w0lf: I tried your suggestion but after that I just got a empty Stream and my new Stream was null
  • Cristian Lupascu
    Cristian Lupascu over 12 years
    @ahmadalishafiee Try using http://google.com as the URL instead of just google.com (if you didn't do that already)
  • Cristian Lupascu
    Cristian Lupascu over 12 years
    @ahmadalishafiee I've added a code sample that works (enumerates all links on the page)
  • ahmadali shafiee
    ahmadali shafiee over 12 years
    @w0lf: first: I'm using http but stackoverflow comment doesn't show it. second: I tried your code but still an empty html like last try
  • Cristian Lupascu
    Cristian Lupascu over 12 years
    @ahmadalishafiee Try using a different URL or try to diagnose your network connection (see if you need to configure a proxy)
  • ahmadali shafiee
    ahmadali shafiee over 12 years
    @w0lf: It's a asp.net and runs into my browser and I can connect to stackoverflow. the problem isn't from network connection
  • ahmadali shafiee
    ahmadali shafiee over 12 years
    @w0lf: there is no proxy or internet problems in my connection.
  • ahmadali shafiee
    ahmadali shafiee over 12 years
    I tried your code but I've got an exception:Object reference not set to an instance of an object.
  • L.B
    L.B over 12 years
    @ahmadalishafiee I run it on another machine. It worked without a problem.
  • ahmadali shafiee
    ahmadali shafiee over 12 years
    First: I tried it using this link it worked fine! the I tried it using this and got a NullRefrenceException. href in foreach statement is null!
  • L.B
    L.B over 12 years
    I added if (href == null) continue; into the loop.
  • Sam
    Sam over 11 years
    There is no difference in resulting strings between "/a" and @"/a".
  • Phill Healey
    Phill Healey over 10 years
    @ahmadalishafiee It's probably worth noting that Google will / does put temporary blocks on IP's if there are an above normal number of calls. This, might be causing you to get nothing returned from goog if you've been running the code a lot in a short space of time. You should probably try using some other urls for now, so as to eliminate that as a factor.