Select elements with attribute data-url using HTMLAgilityPack

11,491

The following should do what you want:

foreach (HtmlNode divNode in htmlDocument.DocumentNode.SelectNodes("//div[@data-url]"))
{
    HtmlAttribute attribute = divNode.Attributes["data-url"];
    links.Add(attribute.Value);
}

Effectively, the statement //div[@data-url] should select all nodes with a data-url attribute. We then pull out this attribute.

If there are nodes other than divs with this attribute, then //*[@data-url] should do the trick.

Share:
11,491

Related videos on Youtube

Joe Black
Author by

Joe Black

Updated on September 16, 2022

Comments

  • Joe Black
    Joe Black over 1 year

    I'm writing a little Download-Roboter, that is searching for links in lower layers for it self.

    What i need to find are all links in an html-Page (the links to .jpg files as well as the links to .pgn, .pdf, .html,.... - files)

    I´m using the html-agilitypack to find all a-href links.

    Sample code:

    foreach (HtmlNode link in htmlDocument.DocumentNode.SelectNodes("//a[@href]"))
    {
        HtmlAttribute attribute = link.Attributes["href"];
        links.Add(attribute.Value);
    }
    

    But i want to find the data-urls as well.

    What XPath-syntax do i have to use to find data-urls. An example data-url in an htmlcode:

        <div class="cbreplay" data-url="2012\edmonton\partien.pgn"></div>
    

    I need the "2012\edmonton\partien.pgn" out of this example. How can i realize this with XPath syntax?

    Best greetings, if i made some bad mistakes, tell me. This is my first question ever.

  • user3791372
    user3791372 over 7 years
    it may be more flexible to use * instead of div - "//*[@data-url]". Those darn html authors keep changing their html!