Getting data from HTML table into a datatable

63,368

Solution 1

There's no such method out of the box from the HTML Agility Pack, but it shouldn't be too hard to create one. There's samples out there that do XML to Datatable from Linq-to-XML. These can be re-worked into what you need.

If needed I can help out creating the whole method, but not today :).

See also:

Solution 2

Using some of Jack Eker's code above and some code from Mark Gravell (see post here) , I managed to come with a solution. This code snippet is used to obtain the public holidays for the year of 2012 in South Africa as of writing this article

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Web;
using System.Net;
using HtmlAgilityPack;



namespace WindowsFormsApplication
{
    public partial class Form1 : Form
    {
        private DataTable dt;
        public Form1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {

            string htmlCode = "";
            using (WebClient client = new WebClient())
            {
                client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
                htmlCode = client.DownloadString("http://www.info.gov.za/aboutsa/holidays.htm");
            }
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

            doc.LoadHtml(htmlCode);

            dt = new DataTable();
            dt.Columns.Add("Name", typeof(string));
            dt.Columns.Add("Value", typeof(string));

            int count = 0;


            foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
            {

                foreach (HtmlNode row in table.SelectNodes("tr"))
                {

                    if (table.Id == "table2")
                    {
                        DataRow dr = dt.NewRow();

                        foreach (var cell in row.SelectNodes("td"))
                        {
                            if ((count % 2 == 0))
                            {
                                dr["Name"] = cell.InnerText.Replace(" ", " ");
                            }
                            else
                            {

                                dr["Value"] = cell.InnerText.Replace(" ", " ");

                                dt.Rows.Add(dr);
                            }
                            count++;

                        }


                    }

                }


                dataGridView1.DataSource = dt;

            }
        }

    }
}

Solution 3

This is my solution. May be a bit messy but it is working perfectly at the moment :D

string htmlCode = "";
using (WebClient client = new WebClient())
{
client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
htmlCode = client.DownloadString("http://www.website.com");
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

doc.LoadHtml(htmlCode);

DataTable dt = new DataTable();
dt.Columns.Add("Name", typeof(string));
dt.Columns.Add("Value", typeof(decimal));

int count = 0;
decimal rowValue = 0;
bool isDecimal = false;
foreach (var row in doc.DocumentNode.SelectNodes("//table[@summary='Table Name']/tbody/tr"))
{
DataRow dr = dt.NewRow();
foreach (var cell in row.SelectNodes("td"))
{
if ((count % 2 == 0))
{
dr["Name"] = cell.InnerText.Replace(" ", " ");
}
else
{
isDecimal = decimal.TryParse((cell.InnerText.Replace(".", "")).Replace(",", "."), out rowValue);
if (isDecimal)
{
dr["Value"] = rowValue;
}
dt.Rows.Add(dr);
}
count++;
}
}

Solution 4

Simple logic to convert a htmltable to datatable :

//Define your webtable
public static HtmlTable table
            {
                get
                {
                    HtmlTable var = new HtmlTable(parent);
                    var.SearchProperties.Add("id", "searchId");
                    return var;
                }
            }

//Convert a webtable to datatable
public static DataTable getTable
            {
                get
                {
                    DataTable dtTable= new DataTable("TableName");
                    UITestControlCollection rows = table.Rows;
                    UITestControlCollection headers = rows[0].GetChildren();
                    foreach (HtmlHeaderCell header in headers)
                    {
                        if (header.InnerText != null)
                            dtTable.Columns.Add(header.InnerText);
                    }
                    for (int i = 1; i < rows.Count; i++)
                    {
                        UITestControlCollection cells = rows[i].GetChildren();
                        string[] data = new string[cells.Count];
                        int counter = 0;
                        foreach (HtmlCell cell in cells)
                        {
                            if (cell.InnerText != null)
                                data[counter] = cell.InnerText;
                            counter++;
                        }
                        dtTable.Rows.Add(data);
                    }
                    return dtTable;
                }
            }
Share:
63,368
Jack Eker
Author by

Jack Eker

Updated on July 09, 2022

Comments

  • Jack Eker
    Jack Eker almost 2 years

    Ok so I need to query a live website to get data from a table, put this HTML table into a DataTable and then use this data. I have so far managed to use Html Agility Pack and XPath to get to each row in the table I need but I know there must be a way to parse it into a DataTable. (C#) The code I am currently using is:

    string htmlCode = "";
    using (WebClient client = new WebClient())
    {
    htmlCode = client.DownloadString("http://www.website.com");
    }
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    
    doc.LoadHtml(htmlCode);
    
    //My attempt at LINQ to solve the issue (not sure where to go from here)
    var myTable = doc.DocumentNode
    .Descendants("table")
    .Where(t =>t.Attributes["summary"].Value == "Table One")
    .FirstOrDefault();
    
    //Finds all the odd rows (which are the ones I actually need but would prefer a
    //DataTable containing all the rows!
    foreach (HtmlNode cell in doc.DocumentNode.SelectNodes("//tr[@class='odd']/td"))
    {
    string test = cell.InnerText;
    //Have not gone further than this yet!
    }
    

    The HTML table on the website I am querying looks like this:

    <table summary="Table One">
    <tbody>
    <tr class="odd">
    <td>Some Text</td>
    <td>Some Value</td>
    </tr>
    <tr class="even">
    <td>Some Text1</td>
    <td>Some Value1</td>
    </tr>
    <tr class="odd">
    <td>Some Text2</td>
    <td>Some Value2</td>
    </tr>
    <tr class="even">
    <td>Some Text3</td>
    <td>Some Value3</td>
    </tr>
    <tr class="odd">
    <td>Some Text4</td>
    <td>Some Value4</td>
    </tr>
    </tbody>
    </table>
    

    I'm not sure whether it is better/easier to use LINQ + HAP or XPath + HAP to get the desired result, I tried both with limited success as you can probably see. This is the first time I have ever made a program to query a website or even interact with a website in any way so I am very unsure at the moment! Thanks for any help in advance :)