How to find all element with PHP Simple HTML DOM Parser?

10,474

Solution 1

/**
 * Refine the input HTML (string) and keep what was specified
 *
 * @param $string : Input HTML
 * @param array $allowed : What will be kept?
 * @return bool|simple_html_dom
 */
function crl_parse_html($string, $allowed = array())
{
    // String --> DOM Elements
    $string = str_get_html($string);
    // Fetch child of the current element (one by one)
    foreach ($string->find('*') as $child) {
        if (
            // Current inner-text contain one or more elements
            preg_match('/<[^<]+?>/is', $child->innertext) and
            // Current element tag is in maintained elements array
            in_array($child->tag, $allowed)
        ) {
            // Assign current inner-text to current filtered inner-text
            $child->innertext = crl_parse_html($child->innertext, $allowed);
        } else if (
            // Current inner-text contain one or more elements
            preg_match('/<[^<]+?>/is', $child->innertext) and
            // Current element tag is NOT in maintained elements array
            !in_array($child->tag, $allowed)
        ) {
            // Assign current inner-text to the set of inner-elements (if exists)
            $child->innertext = preg_replace('/(?<=^|>)[^><]+?(?=<|$)(<[^\/]+?>.+)/is', '$1', $child->innertext);
            // Assign current outer-text to current filtered inner-text
            $child->outertext = crl_parse_html($child->innertext, $allowed);
        } else if (
            (
                // Current inner-text is only plaintext
                preg_match('/(?<=^|>)[^><]+?(?=<|$)/is', $child->innertext) and
                // Current element tag is NOT in maintained elements array
                !in_array($child->tag, $allowed)
            ) or
            // Current plain-text is empty
            trim($child->plaintext) == ''
        ) {
            // Assign current outer-text to empty string
            $child->outertext = '';
        }
    }
    return $string;
}

This is my solution, I made it, I just post here if someone need it and end this question.
Note that: this function uses recursive. So, too large data will be a big problem. Reconsider carefully when decide to use this function.

Solution 2

Your example appears to work fine, try the following, which will output the innertext of every element.

foreach($html->find('*') as $test)
  echo $test->innertext;

For example:

$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');

Outputs

HelloWorld
Share:
10,474
Manhhailua
Author by

Manhhailua

JavaScript lover!

Updated on July 28, 2022

Comments

  • Manhhailua
    Manhhailua almost 2 years
    // Find all element has attribute id
    $ret = $html->find('*[id]');
    

    This is an example for finding all elements which have attribute id. Is there any way to find all elements. I try this way but it does not work:

    // Find all element
    $ret = $html->find('*'); 
    

    additional:

    I want to fetch through all the elements in $html, all parents and childs elements will be fetched. Example:

    <div>
        <span>
            <div>World!</div>
            <div>
                <span>Hello!</span>
                <span>
                    <div>Hello World!</div>
                </span>
            </div>
        </span>
    </div>
    

    Now I want to escape all <span> with their plaintext inside and keep all <div> we have! Expected result:

    <div>
        <div>World!</div>
        <div>
            <div>Hello World!</div>
        </div>
    </div>
    
  • Manhhailua
    Manhhailua over 10 years
    What if the $html is <div id="hello">Hello</div><div id="world">World<div>mama</div></div>. I mean I want to fetch through all the element of $html, from parents to childs.
  • Pez Cuckow
    Pez Cuckow over 10 years
    That's not how accessing the DOM works, see my edit. Can you provide some HTML and your expected output. You'll need to access the DOM tree using methods such as $html->children()
  • Manhhailua
    Manhhailua over 10 years
    I've added some details to the main question, you can take a look at it
  • Elias
    Elias over 9 years
    Explaining what your function does step by step could help future S.O. members.