Html Agility Pack, SelectNodes from a node

38,799

Solution 1

It's a bit confusing because you're expecting that it would do a selectNodes on only the div with id "myTrips", however if you do another SelectNodes("//li") it will performn another search from the top of the document.

I fixed this by combining the statement into one, but that would only work on a webpage where you have only one div with an id "mytrips". The query would look like this:

doc.DocumentNode.SelectNodes("//div[@id='myTrips'] //li");

Solution 2

var liOfTravels = doc.DocumentNode.SelectSingleNode("//div[@id='myTrips']")
                 .SelectNodes(".//li");

Note the dot in the second line. Basically in this regard HTMLAgitilityPack completely relies on XPath syntax, however the result is non-intuitive, because those queries are effectively the same:

doc.DocumentNode.SelectNodes("//li");
some_deeper_node.SelectNodes("//li");

Solution 3

Creating a new node can be beneficial in some situations and lets you use the xpaths more intuitively. I've found this useful in a couple of places.

var myTripsDiv = doc.DocumentNode.SelectSingleNode("//div[@id='myTrips']");
var myTripsNode = HtmlNode.CreateNode(myTripsDiv.InnerHtml);
var liOfTravels = myTripsNode.SelectNodes("//li");

Solution 4

You can do this with a Linq query:

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);

var travelList = new List<HtmlNode>();
foreach (var matchingDiv in doc.DocumentNode.DescendantNodes().Where(n=>n.Name == "div" && n.Id == "myTrips"))
{
    travelList.AddRange(matchingDiv.DescendantNodes().Where(n=> n.Name == "li"));
}

I hope it helps

Solution 5

This seems counter intuitive to me aswell, if you run a selectNodes method on a particular node I thought it would only search for stuff underneath that node, not in the document in general.

Anyway OP if you change this line :

var liOfTravels = 
doc.DocumentNode.SelectSingleNode("//div[@id='myTrips']").SelectNodes("//li");

TO:

var liOfTravels = 
doc.DocumentNode.SelectSingleNode("//div[@id='myTrips']").SelectNodes("li");

I think you'll be ok, i've just had the same issue and that fixed it for me. Im not sure though if the li would have to be a direct child of the node you have.

Share:
38,799
thatsIT
Author by

thatsIT

I'm a software developer that loves my job.

Updated on July 05, 2022

Comments

  • thatsIT
    thatsIT almost 2 years

    Why does this pick all of my <li> elements in my document?

    HtmlWeb web = new HtmlWeb();
    HtmlDocument doc = web.Load(url);
    
    var travelList = new List<Page>();
    var liOfTravels = doc.DocumentNode.SelectSingleNode("//div[@id='myTrips']")
                         .SelectNodes("//li");
    

    What I want is to get all <li> elements in the <div> with an id of "myTrips".

  • derloopkat
    derloopkat about 10 years
    I don't think the queries are the same. Actually when he does the first select "//div[@id='myTrips']" the current node changes. That's why the second select should be ".//li" (anywhere from current node) and not "//li" (anywhere from root). Agility does exactly what is expected to do.
  • greenoldman
    greenoldman about 10 years
    @derloopkat, they are the same (there is no IMHO here; if they weren't you could drop the dot in the solution query, but you cannot, can you?). Unfortunately HTMLAgilityPack searches from the root, no matter what node you are at. The IMHO part is this -- usually the point of focusing on given node is that you continue search from that node, not from the root again. The solution query without added dot in the second sub-query would not make sense at all, thus question why supporting them?
  • derloopkat
    derloopkat about 10 years
    We are talking about different things. When I said the queries are not the same I was talking about "//li" and ".//li". By "those queries" you refer the queries below.
  • Jroonk
    Jroonk over 9 years
    The ".//li" dot notation in xpath syntax just makes it relative to the current level instead of relative to the root level. So it is completely intuitive to me. You should delete your other comment though because you confuse the issue.