PHP DOMDocument errors/warnings on html5-tags
Solution 1
No, there is no way of specifying a particular doctype to use, or to modify the requirements of the existing one.
Your best workable solution is going to be to disable error reporting with libxml_use_internal_errors
:
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML('...');
libxml_clear_errors();
Solution 2
You could also do
@$dom->loadHTML($htmlString);
Solution 3
You can filter the errors you get from the parser. As per other answers here, turn off error reporting to the screen, and then iterate through the errors and only show the ones you want:
libxml_use_internal_errors(TRUE);
// Do your load here
$errors = libxml_get_errors();
foreach ($errors as $error)
{
/* @var $error LibXMLError */
}
Here is a print_r()
of a single error:
LibXMLError Object
(
[level] => 2
[code] => 801
[column] => 17
[message] => Tag section invalid
[file] =>
[line] => 39
)
By matching on the message
and/or the code
, these can be filtered out quite easily.
Solution 4
There doesn't seem to be a way to kill warnings but not errors. PHP has constants that are supposed to do this, but they don't seem to work. Here is what is SHOULD work, but doesn't because (bug?)....
$doc=new DOMDocument();
$doc->loadHTML("<tagthatdoesnotexist><h1>Hi</h1></tagthatdoesnotexist>", LIBXML_NOWARNING );
echo $doc->saveHTML();
Klaas S.
Updated on December 11, 2020Comments
-
Klaas S. over 3 years
I've been attempting to parse HTML5-code so I can set attributes/values within the code, but it seems DOMDocument(PHP5.3) doesn't support tags like
<nav>
and<section>
.Is there any way to parse this as HTML in PHP and manipulate the code?
Code to reproduce:
<?php $dom = new DOMDocument(); $dom->loadHTML("<!DOCTYPE HTML> <html><head><title>test</title></head> <body> <nav> <ul> <li>first <li>second </ul> </nav> <section> ... </section> </body> </html>");
Error
Warning: DOMDocument::loadHTML(): Tag nav invalid in Entity, line: 4 in /home/wbkrnl/public_html/new-mvc/1.php on line 17
Warning: DOMDocument::loadHTML(): Tag section invalid in Entity, line: 10 in /home/wbkrnl/public_html/new-mvc/1.php on line 17
-
Peter Krauss over 10 yearsOps, for me
loadHTML($HTML5)
returns FALSE (failure)! I need to change the new tags to DIVs... -
Klaas S. almost 10 yearsError suppression is not a proper way of dealing with this issue.
-
Dan Lugg almost 10 years@KlaasSangers Until we have a non-crippled DOM implementation, I'm afraid it is (either through
@
orlibxml_*
) -
hanshenrik over 9 yearsyeah, in this specific case, error supression is the best solution, in my opinion. unless you know that the HTML you will be loading, is supposed to be 100% valid HTML per PHP's definition. which in my experience, is never the case.
-
Nick Manning about 9 years@KlaasSangers...why not?
-
Super Cat almost 7 yearsAny reason php7's built-into DOM parser still can't handle HTML5? It's been 6 years since this answer was submitted.
-
lonesomeday almost 7 years@SuperCat It's all dependant on the underlying libxml library.
-
mmmmm over 6 yearsAccording to this post stackoverflow.com/a/41845049/937477 that bug has been fixed
-
Kevin_Kinsey about 6 years--- not to mention HTML5 isn't XML, never was, has been, nor will be...
-
Admin almost 5 yearsUpdate 2019: The warning is still fired however
loadHTML
now actually accept HTML5 tags. -
Greg over 4 yearsJust to be pedantic, that is not valid HTML5. Custom elements have to have a hyphen in them according to the spec w3c.github.io/webcomponents/spec/custom/…
-
user2782001 over 4 years@Greg Good to know. It's just a test to demonstrate the xml parser will recognize the tag is not valid, but ignore it because of the flag.
-
marcus over 4 yearsPHP8 "The @ operator no longer silences fatal errors It's possible that this change might reveal errors that again were hidden before PHP 8. Make sure to set display_errors=Off on your production servers!" stitcher.io/blog/new-in-php-8
-
Fabien Snauwaert over 3 years@user10351292 since which version? Can't find a changelog for libxml2. As of libxml2 version 2.9.10, still have to use
libxml_use_internal_errors(true)
to avoid having it throw an exception on HTML5 tags. (Found one more reason to hate PHP.)