Html Agility Pack help

10,155

Solution 1

Use HtmlAgilityPack.HtmlDocument:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

The compiler is getting confused because two of the namespaces you have imported with using contain classes called HtmlDocument - the HTML Agility Pack namespace, and the Windows Forms namespace. You can get around this by specifying which class you want to use explicitly.

Solution 2

this is how i achieved. Note that there is a code error given in main Html Agility Pack Example in foreach line doc.DocumentElement.SelectNodes("//a[@href"]). The correct and tested one is given below.

 HtmlWeb hw = new HtmlWeb();

    HtmlDocument doc = hw.Load(@"http://adityabajaj.com");
    StringBuilder sb = new StringBuilder();

    List<string> lstHref = new List<string>();

    foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]").Distinct())
    {
        string curHref = link.Attributes["href"].Value;

        if(!lstHref.Contains(curHref))
        lstHref.Add(curHref);

    }
    foreach (string str in lstHref)
    {
        sb.Append(str +"<br />");
    }

    Response.Write (sb.ToString());

Since it got working for me, I thought I should share.

Solution 3

The classes in the two namespaces System.Windows.Forms and HtmlAgilityPack are conflicting. Use fully-qualified type names or use namespace aliases.

Solution 4

I have written a couple of articles that explain how to use HtmlAgilityPack. You might find them useful to get started:

WARNING (2012-06-08): This link is a bit spammy - dodgy pop-under adverts, not much content.

I don't know if they have fixed it now but that snippet didn't used to work on the homepage of the site, I think it was from an earlier version of the library. Also the snippet doesn't define FixLink() so it wouldn't work even if it was correct for the library.

I would recommend getting the latest beta version of the library because it has extra extensions for performing linq queries against it which can save you from confusing xpath queries later on.

I haven't seen it used in a Windows Forms app before but it looks like you will have to use fully-qualified type names like:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

As for the actual task you are trying to perform, it seems like you want to take a url, inject a username and id into it and then... not sure? You look like you are both trying to save the file out to disk and set the html code to the contents of a Form which I don't think you can do?

Share:
10,155
Victor Bjelkholm
Author by

Victor Bjelkholm

xkcd - Wisdom of the Ancients

Updated on July 18, 2022

Comments

  • Victor Bjelkholm
    Victor Bjelkholm almost 2 years

    I'm trying to scrape some information from a website but can't find a solution that works for me. Every code I read on the Internet generates at least one error for me.

    Even the example code at their homepage generates errors for me.

    My code:

             HtmlDocument doc = new HtmlDocument();
             doc.Load("https://www.flashback.org/u479804");
             foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
             {
                HtmlAttribute att = link["href"];
                att.Value = FixLink(att);
             }
             doc.Save("file.htm");
    

    Generates the following error:

    'HtmlDocument' is an ambiguous reference between 'System.Windows.Forms.HtmlDocument' and 'HtmlAgilityPack.HtmlDocument' C:*\Form1.cs

    Edit: My entire code is located here: http://beta.yapaste.com/55

    All help is very appreciated!