Simple HTML Dom: How to remove elements?

50,444

Solution 1

There is no dedicated methods for removing elements. You just find all the img elements and then do

$e->outertext = '';

Solution 2

when you only delete the outer text you delete the HTML content itself, but if you perform another find on the same elements it will appear in the result. the reason is that the simple HTML DOM object still has it's internal structure of the element, only without its actual content. what you need to do in order to really delete the element is simply reload the HTML as string to the same variable. this way the object will be recreated without the deleted content, and the simple HTML DOM object will be built without it.

here is an example function:

public function removeNode($selector)
{
    foreach ($this->find($selector) as $node)
    {
        $node->outertext = '';
    }

    $this->load($this->save());        
}

put this function inside the simple_html_dom class and you're good.

Solution 3

I think you have some difficulties because you forgot to save(dump the internal DOM tree back into string).

Try this:

$html = file_get_html("http://example.com");

foreach($html ->find('img') as $item) {
    $item->outertext = '';
    }

$html->save();

echo $html;

Solution 4

I could not figure out where to put the function so I just put the following directly in my code:

$html->load($html->save());

It basically locks changes made in the for loop back into the html per above.

Solution 5

The supposed solutions are quite expensive and practically unusable in a big loop or other kind of repetition.

I prefer to use "soft deletes":

foreach($html->find('somecondition'),$item){
    if (somecheck) $item->setAttribute('softDelete', true); //<= set marker to check in further code
    $item->outertext='';


   foreach($foo as $bar){
       if(!baz->getAttribute('softDelete'){
           //do something 
        }
    }

}
Share:
50,444
kasakka
Author by

kasakka

Updated on February 19, 2022

Comments

  • kasakka
    kasakka over 2 years

    I would like to use Simple HTML DOM to remove all images in an article so I can easily create a small snippet of text for a news ticker but I haven't figured out how to remove elements with it.

    Basically I would do

    1. Get content as HTML string
    2. Remove all image tags from content
    3. Limit content to x words
    4. Output.

    Any help?