Regular Expression - get tables from html string in PHP

12,666

You need to perform a non greedy match: /(<table[^>]*>(?:.|\n)*?<\/table>)/. Note the question mark: ?.

However, I would use a DOM parser for that:

$doc = new DOMDocument();
$doc->loadHTML($html);

$tables = $doc->getElementsByTagName('table');
foreach($tables as $table) {
    $content = $doc->saveHTML($table); 
}

While it is already more convenient to use a DOM parser for extracting data from HTML documents, it is definitely the better solution if you are attempting to modify the HTML (as you told).

Share:
12,666
Jozze
Author by

Jozze

Updated on June 04, 2022

Comments

  • Jozze
    Jozze about 2 years

    I try to wrap all tables inside my content with a special div container, to make them usable for mobile. I can't wrap the tables, before they are saved within the database of the custom CSS. I managed to get to the content, before it's printed on the page and I need to preg_replace all the tables there.

    I do this, to get all tables:

    preg_match_all('/(<table[^>]*>(?:.|\n)*<\/table>)/', $aFile['sContent'], $aMatches);
    

    The problem is to get the inner part (?:.|\n)* to match everything that is inside the tags, without matching the ending tag. Right now the expression matches everything, even the ending tag of the table...

    Is there a way to exclude the match for the ending tag?

  • Talisin
    Talisin almost 10 years
    +1 as avoiding regex for parsing HTML which is not a regular language and hence should not be parsed by regular expressions.
  • Jozze
    Jozze almost 10 years
    Thank you! The non greedy match did the trick! My final regexp: /(?m)(<table[^>]*>(?:.|\n|\r)*?<\/table>)/ I'm not that familiar with the DOM parser, but i'll try to implement this version. If i get it right, i'll use this instead. Thanks a lot :)
  • hek2mgl
    hek2mgl almost 10 years
    You are welcome. Just copy the code I've posted. The example aims to be working code.
  • Jozze
    Jozze almost 10 years
    Doesn't work for me... at least for now. There seem to be some namespace errors. It can't find DOMDocument() ... maybe the php extension is not installed or something like that. But the regex works for now and i'll try to change it again, when our senior developer comes back. I'll try to remember to post the result here, when it's done. Thanks again!
  • hek2mgl
    hek2mgl almost 10 years
    @Jozze If you are working in a namespace you need to use \DOMDocument .. Note the `\` which is addressing the global PHP namespace.