PHP DOMDocument errors/warnings on html5-tags

68,413

Solution 1

No, there is no way of specifying a particular doctype to use, or to modify the requirements of the existing one.

Your best workable solution is going to be to disable error reporting with libxml_use_internal_errors:

$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML('...');
libxml_clear_errors();

Solution 2

You could also do

@$dom->loadHTML($htmlString);

Solution 3

You can filter the errors you get from the parser. As per other answers here, turn off error reporting to the screen, and then iterate through the errors and only show the ones you want:

libxml_use_internal_errors(TRUE);
// Do your load here
$errors = libxml_get_errors();

foreach ($errors as $error)
{
    /* @var $error LibXMLError */
}

Here is a print_r() of a single error:

LibXMLError Object
(
    [level] => 2
    [code] => 801
    [column] => 17
    [message] => Tag section invalid

    [file] => 
    [line] => 39
)

By matching on the message and/or the code, these can be filtered out quite easily.

Solution 4

There doesn't seem to be a way to kill warnings but not errors. PHP has constants that are supposed to do this, but they don't seem to work. Here is what is SHOULD work, but doesn't because (bug?)....

 $doc=new DOMDocument();
 $doc->loadHTML("<tagthatdoesnotexist><h1>Hi</h1></tagthatdoesnotexist>", LIBXML_NOWARNING );
 echo $doc->saveHTML();

http://php.net/manual/en/libxml.constants.php

Share:
68,413
Klaas S.
Author by

Klaas S.

Updated on December 11, 2020

Comments

  • Klaas S.
    Klaas S. over 3 years

    I've been attempting to parse HTML5-code so I can set attributes/values within the code, but it seems DOMDocument(PHP5.3) doesn't support tags like <nav> and <section>.

    Is there any way to parse this as HTML in PHP and manipulate the code?


    Code to reproduce:

    <?php
    $dom = new DOMDocument();
    $dom->loadHTML("<!DOCTYPE HTML>
    <html><head><title>test</title></head>
    <body>
    <nav>
      <ul>
        <li>first
        <li>second
      </ul>
    </nav>
    <section>
      ...
    </section>
    </body>
    </html>");
    

    Error

    Warning: DOMDocument::loadHTML(): Tag nav invalid in Entity, line: 4 in /home/wbkrnl/public_html/new-mvc/1.php on line 17

    Warning: DOMDocument::loadHTML(): Tag section invalid in Entity, line: 10 in /home/wbkrnl/public_html/new-mvc/1.php on line 17

  • Peter Krauss
    Peter Krauss over 10 years
    Ops, for me loadHTML($HTML5) returns FALSE (failure)! I need to change the new tags to DIVs...
  • Klaas S.
    Klaas S. almost 10 years
    Error suppression is not a proper way of dealing with this issue.
  • Dan Lugg
    Dan Lugg almost 10 years
    @KlaasSangers Until we have a non-crippled DOM implementation, I'm afraid it is (either through @ or libxml_*)
  • hanshenrik
    hanshenrik over 9 years
    yeah, in this specific case, error supression is the best solution, in my opinion. unless you know that the HTML you will be loading, is supposed to be 100% valid HTML per PHP's definition. which in my experience, is never the case.
  • Nick Manning
    Nick Manning about 9 years
    @KlaasSangers...why not?
  • Super Cat
    Super Cat almost 7 years
    Any reason php7's built-into DOM parser still can't handle HTML5? It's been 6 years since this answer was submitted.
  • lonesomeday
    lonesomeday almost 7 years
    @SuperCat It's all dependant on the underlying libxml library.
  • mmmmm
    mmmmm over 6 years
    According to this post stackoverflow.com/a/41845049/937477 that bug has been fixed
  • Kevin_Kinsey
    Kevin_Kinsey about 6 years
    --- not to mention HTML5 isn't XML, never was, has been, nor will be...
  • Admin
    Admin almost 5 years
    Update 2019: The warning is still fired however loadHTML now actually accept HTML5 tags.
  • Greg
    Greg over 4 years
    Just to be pedantic, that is not valid HTML5. Custom elements have to have a hyphen in them according to the spec w3c.github.io/webcomponents/spec/custom/…
  • user2782001
    user2782001 over 4 years
    @Greg Good to know. It's just a test to demonstrate the xml parser will recognize the tag is not valid, but ignore it because of the flag.
  • marcus
    marcus over 4 years
    PHP8 "The @ operator no longer silences fatal errors It's possible that this change might reveal errors that again were hidden before PHP 8. Make sure to set display_errors=Off on your production servers!" stitcher.io/blog/new-in-php-8
  • Fabien Snauwaert
    Fabien Snauwaert over 3 years
    @user10351292 since which version? Can't find a changelog for libxml2. As of libxml2 version 2.9.10, still have to use libxml_use_internal_errors(true) to avoid having it throw an exception on HTML5 tags. (Found one more reason to hate PHP.)