Disable warnings when loading non-well-formed HTML by DomDocument (PHP)

49,812

Solution 1

You can install a temporary error handler with set_error_handler

class ErrorTrap {
  protected $callback;
  protected $errors = array();
  function __construct($callback) {
    $this->callback = $callback;
  }
  function call() {
    $result = null;
    set_error_handler(array($this, 'onError'));
    try {
      $result = call_user_func_array($this->callback, func_get_args());
    } catch (Exception $ex) {
      restore_error_handler();        
      throw $ex;
    }
    restore_error_handler();
    return $result;
  }
  function onError($errno, $errstr, $errfile, $errline) {
    $this->errors[] = array($errno, $errstr, $errfile, $errline);
  }
  function ok() {
    return count($this->errors) === 0;
  }
  function errors() {
    return $this->errors;
  }
}

Usage:

// create a DOM document and load the HTML data
$xmlDoc = new DomDocument();
$caller = new ErrorTrap(array($xmlDoc, 'loadHTML'));
// this doesn't dump out any warnings
$caller->call($fetchResult);
if (!$caller->ok()) {
  var_dump($caller->errors());
}

Solution 2

Call

libxml_use_internal_errors(true);

prior to processing with with $xmlDoc->loadHTML()

This tells libxml2 not to send errors and warnings through to PHP. Then, to check for errors and handle them yourself, you can consult libxml_get_last_error() and/or libxml_get_errors() when you're ready:

libxml_use_internal_errors(true);
$dom->loadHTML($html);
$errors = libxml_get_errors();
foreach ($errors as $error) {
    // handle the errors as you wish
}

Solution 3

To hide the warnings, you have to give special instructions to libxml which is used internally to perform the parsing:

libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();

The libxml_use_internal_errors(true) indicates that you're going to handle the errors and warnings yourself and you don't want them to mess up the output of your script.

This is not the same as the @ operator. The warnings get collected behind the scenes and afterwards you can retrieve them by using libxml_get_errors() in case you wish to perform logging or return the list of issues to the caller.

Whether or not you're using the collected warnings you should always clear the queue by calling libxml_clear_errors().

Preserving the state

If you have other code that uses libxml it may be worthwhile to make sure your code doesn't alter the global state of the error handling; for this, you can use the return value of libxml_use_internal_errors() to save the previous state.

// modify state
$libxml_previous_state = libxml_use_internal_errors(true);
// parse
$dom->loadHTML($html);
// handle errors
libxml_clear_errors();
// restore
libxml_use_internal_errors($libxml_previous_state);

Solution 4

Setting the options "LIBXML_NOWARNING" & "LIBXML_NOERROR" works perfectly fine too:

$dom->loadHTML($html, LIBXML_NOWARNING | LIBXML_NOERROR);
Share:
49,812

Related videos on Youtube

Viet
Author by

Viet

Developer who is passionate about web, C++, design, classical music, art and tries mixing them together.

Updated on June 12, 2021

Comments

  • Viet
    Viet about 3 years

    I need to parse some HTML files, however, they are not well-formed and PHP prints out warnings to. I want to avoid such debugging/warning behavior programatically. Please advise. Thank you!

    Code:

    // create a DOM document and load the HTML data
    $xmlDoc = new DomDocument;
    // this dumps out the warnings
    $xmlDoc->loadHTML($fetchResult);
    

    This:

    @$xmlDoc->loadHTML($fetchResult)
    

    can suppress the warnings but how can I capture those warnings programatically?

    • Marcin
      Marcin over 11 years
      Try this solution - seems to be much easier - stackoverflow.com/questions/6090667/…
    • Wrikken
      Wrikken almost 11 years
      Converting lousy input to proper output is what pays the bills ;) The recover option is in the manual. it's just a boolean. You can just call $dom->saveHTML() so see what kind if document libxml is trying to make of your $html input, usually it's pretty close/ok.
  • thomasrutter
    thomasrutter about 14 years
    Seems like a lot of overkill for the situation. Note PHP's libxml2 functions.
  • troelskn
    troelskn about 14 years
    Good point, Thomas. I didn't know about these functions when I wrote this answer. If I'm not mistaken, it does the same thing internally btw.
  • thomasrutter
    thomasrutter about 14 years
    It has the same effect in this case yes, though it's done at a different level: with the above solution, PHP errors are generated but suppressed but with mine, they don't become PHP errors. I personally feel that if doing something involves suppressing PHP errors either through @ or set_error_handler(), then it's the wrong way to do it. That's just my opinion though. Note that PHP errors and exceptions are a different thing entirely - using try {} catch() {} is fine.
  • troelskn
    troelskn about 14 years
    I think I've seen some bug reports, that suggests that libxml_use_internal_errors hooks in to php's error handler.
  • hakre
    hakre almost 11 years
    @Greeso: It is set to the previous value. That's done by the concept that it might have been configured for some other code globally different to FALSE and setting it to FALSE afterwards would destroy that setting. By using the previous return value $libxml_previous_state those potential side-effects are prevented because the original configuration has been restored independent to this place needs. The libxml_use_internal_errors() setting is global, so it's worth to take some care.
  • cHao
    cHao almost 8 years
    If there are already libxml errors pending, won't this eat them?
  • Ja͢ck
    Ja͢ck almost 8 years
    @cHao isn't it reasonable to assume that you're starting off with a blank slate? :)
  • cHao
    cHao almost 8 years
    @Ja͢ck: Nope. If something previously called libxml_use_internal_errors(true), then it may be waiting to handle whatever errors have arisen.
  • Brian Klug
    Brian Klug about 5 years
    So much easier then adding 20 lines of code as the accepted answer does. Thanks!