Html Agility Pack loop through table rows and columns
Solution 1
I had to provide the full xpath. I got the full xpath by using Firebug from a suggestion by @Coda (https://stackoverflow.com/a/3104048/1238850) and I ended up with this code:
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("/html/body/table/tbody/tr/td/table[@id='table2']/tbody/tr"))
{
HtmlNodeCollection cells = row.SelectNodes("td");
for (int i = 0; i < cells.Count; ++i)
{
if (i == 0)
{ Response.Write("Person Name : " + cells[i].InnerText + "<br>"); }
else {
Response.Write("Other attributes are: " + cells[i].InnerText + "<br>");
}
}
}
I am sure it can be written way better than this but it is working for me now.
Solution 2
Why don't you just select the td
s directly?
foreach (HtmlNode col in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr//td"))
Response.Write(col.InnerText);
Alternately, if you really need the tr
s separately for some other processing, drop the //
and do:
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr"))
foreach (HtmlNode col in row.SelectNodes("td"))
Response.Write(col.InnerText);
Of course that will only work if the td
s are direct children of the tr
s but they should be, right?
EDIT:
var cols = doc.DocumentNode.SelectNodes("//table[@id='table2']//tr//td");
for (int ii = 0; ii < cols.Count; ii=ii+2)
{
string name = cols[ii].InnerText.Trim();
int age = int.Parse(cols[ii+1].InnerText.Split(' ')[1]);
}
There's probably a more impressive way to do this with LINQ.
Solution 3
I've run the code and it displays only the Names, which is correct, because the Ages are defined using invalid HTML: <th></td>
(probably a typo).
By the way, the code can be simplified to only one loop:
foreach (var cell in doc.DocumentNode.SelectNodes("//table[@id='table2']/tr/td"))
{
Response.Write(cell.InnerText);
}
Here's the code I used to test: http://pastebin.com/euzhUAAh
Solution 4
I did the same project with this:
private List<PhrasalVerb> ExtractVerbsFromMainPage(string content)
{
var verbs =new List<PhrasalVerb>(); ;
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(content);
var rows = doc.DocumentNode.SelectNodes("//table[@class='idioms-table']//tr");
rows.RemoveAt(0); //remove header
foreach (var row in rows)
{
var cols = row.SelectNodes("td");
verbs.Add(new PhrasalVerb {
Uid = Guid.NewGuid(),
Name = cols[0].InnerHtml,
Definition = cols[1].InnerText,
Count =int.TryParse(cols[2].InnerText,out _) == true ? Convert.ToInt32(cols[2].InnerText) : 0
});
}
return verbs;
}
mpora
PHP, C#, JavaScript, jQuery, AngularJS, RoR, Ruby, SQL Server, SQLite, MySQL, CSS, HTML, Bootstrap, System Design
Updated on July 15, 2022Comments
-
mpora almost 2 years
I have a table like this
<table border="0" cellpadding="0" cellspacing="0" id="table2"> <tr> <th>Name </th> <th>Age </th> </tr> <tr> <td>Mario </td> <th>Age: 78 </td> </tr> <tr> <td>Jane </td> <td>Age: 67 </td> </tr> <tr> <td>James </td> <th>Age: 92 </td> </tr> </table>
And want to use HTML Agility Pack to parse it. I have tried this code to no avail:
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr")) { foreach (HtmlNode col in row.SelectNodes("//td")) { Response.Write(col.InnerText); } }
What am I doing wrong?
-
mpora about 11 yearsYes I want to do use each column for processing as you can see the second column is a mixture of numbers and text, I would like to extract the number. The page just circles and no result after I tried this code.