Read data from HTML table with PHP

11,401

Solution 1

using Tidy, DOMDocument and DOMXPath (make sure the PHP extensions are enabled) you can do something like this:

<?php
$url = "http://example.org/test.html";

function get_data_from_table($id, $url)
{
    // retrieve the content of that url
    $content = file_get_contents($url);

    // repair bad HTML
    $tidy = tidy_parse_string($content);
    $tidy->cleanRepair();
    $content = (string)$tidy;

    // load into DOM
    $dom = new DOMDocument();
    $dom->loadHTML($content);

    // make xpath-able
    $xpath = new DOMXPath($dom);

    // search for the first td of each tr, where its content is $id
    $query = "//tr/td[position()=1 and normalize-space(text())='$id']";
    $elements = $xpath->query($query);
    if ($elements->length != 1) {
        // not exactly 1 result as expected? return number of hits
        return $elements->length;
    }

    // our td was found
    $element = $elements->item(0);

    // get his parent element (tr)
    $tr = $element->parentNode;
    $data = array();

    // iterate over it's td elements
    foreach ($tr->getElementsByTagName("td") as $td) {
        // retrieve the content as text
        $data[] = $td->textContent;
    }

    // return the array of <td> contents
    return $data;
}

echo '<pre>';
print_r(
    get_data_from_table(
        414,
        $url
    )
);
echo '</pre>';

Your HTML source (http://example.org/test.html):

<table><tr>
<td>413</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td><a href="http://clubpenguincheatsnow.com/tools/swfviewer/items.swf?id=413">View SWF</a></td>
</tr><tr>
<td>414</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td><a href="http://clubpenguincheatsnow.com/tools/swfviewer/items.swf?id=413">View SWF</a></td>
</tr>

(as you can see, no valid HTML, but this doesn't matter)

Solution 2

This works: (although a bit ugly, perhaps someone else can come up with a better xpath solution)

$html = <<<HTML
<html>
    <body>
        <table>
            <thead>
                <tr>
                    <td>id</td>
                    <td>name</td>
                    <td>a</td>
                    <td>b</td>
                    <td>c</td>
                </tr>
            </thead>
            <tbody>
                <tr>
                    <td>413</td>
                    <td>Party Hat</td>
                    <td>0</td>
                    <td>No</td>
                    <td>a link</td>
                </tr>
                <tr>
                    <td>414</td>
                    <td>Party Hat 2</td>
                    <td>0</td>
                    <td>No</td>
                    <td>a link</td>
                </tr>
            </tbody>
        </table>
    </body>
</html>
HTML;

$doc = new DOMDocument();
$doc->loadHTML($html);
$domxpath = new DOMXPath($doc);

$res = $domxpath->query("//*[local-name() = 'td'][text() = 'Party Hat']/../td[position() = '1']");

var_dump($res->length, $res->item(0)->textContent);

Outputs:

int(1)
string(3) "413"

Share:
11,401
S17514
Author by

S17514

Updated on June 04, 2022

Comments

  • S17514
    S17514 almost 2 years

    Lately I've had a question, what I'm trying to do is read data from an HTML table and grab the data into a variable called $id. For example I have this code:

    <tr>
    <td>413</td>
    <td>Party Hat</td>
    <td>0</td>
    <td>No</td>
    <td><a href="http://clubpenguincheatsnow.com/tools/swfviewer/items.swf?id=413">View SWF</a></td>
    </tr>
    

    What I want to do is that another variable called $array[$i] which is holding a search query. I want my PHP code to search through the table until it finds the section with that specific query in it. In this case is would be "Party Hat." What I want it to do after it finds the query is for it to look at the ID which is the "td" section above the name "Party Hat" the ID in this case is 413. After this I want the variable $id to hold the ID. How do I do this? Any help would be HIGHLY appreciated!