How to strip a tag and all of its inner html using the tag's id?

56,824

Solution 1

With native DOM

$dom = new DOMDocument;
$dom->loadHTML($htmlString);
$xPath = new DOMXPath($dom);
$nodes = $xPath->query('//*[@id="anotherDiv"]');
if($nodes->item(0)) {
    $nodes->item(0)->parentNode->removeChild($nodes->item(0));
}
echo $dom->saveHTML();

Solution 2

You can use preg_replace() like:

$string = preg_replace('/<div id="someid"[^>]+\>/i', "", $string);

Solution 3

Using the native XML Manipulation Library

Assuming that your html content is stored in the variable $html:

$html='<html>
 <body>
 bla bla bla bla
  <div id="myDiv"> 
         more text
      <div id="anotherDiv">
           And even more text
      </div>
  </div>

  bla bla bla
 </body>
</html>';

To delete the tag by ID use the following code:

    $dom=new DOMDocument;

    $dom->validateOnParse = false;

    $dom->loadHTML( $html );

    // get the tag

    $div = $dom->getElementById('anotherDiv');

   // delete the tag

    if( $div && $div->nodeType==XML_ELEMENT_NODE ){

        $div->parentNode->removeChild( $div );
    }

    echo $dom->saveHTML();

Note that certain versions of libxml require a doctype to be present in order to use the getElementById method.

In that case you can prepend $html with <!doctype>

$html = '<!doctype>' . $html;

Alternatively, as suggested by Gordon's answer, you can use DOMXPath to find the element using the xpath:

$dom=new DOMDocument;

$dom->validateOnParse = false;

$dom->loadHTML( $html );

$xp=new DOMXPath( $dom );

$col = $xp->query( '//div[ @id="anotherDiv" ]' );

if( !empty( $col ) ){

    foreach( $col as $node ){

        $node->parentNode->removeChild( $node );

    }

}

echo $dom->saveHTML();

The first method works regardless the tag. If you want to use the second method with the same id but a different tag, let say form, simply replace //div in //div[ @id="anotherDiv" ] by '//form'

Share:
56,824
rockstardev
Author by

rockstardev

Updated on November 29, 2020

Comments

  • rockstardev
    rockstardev over 3 years

    I have the following html:

    <html>
     <body>
     bla bla bla bla
      <div id="myDiv"> 
             more text
          <div id="anotherDiv">
               And even more text
          </div>
      </div>
    
      bla bla bla
     </body>
    </html>
    

    I want to remove everything starting from <div id="anotherDiv"> until its closing <div>. How do I do that?

  • Haim Evgi
    Haim Evgi almost 14 years
    trip_tags() doesn’t work the way he want it to. strip_tags() allows for certain exclusions, but why would you use that when you only want to exclude one tag and include all other tags
  • ItsPronounced
    ItsPronounced almost 14 years
    From his question, I couldn't really tell what tags he was trying to remove. It seemed as if he wanted to remove everything. Thanks for the input.
  • ItsPronounced
    ItsPronounced almost 14 years
    Ahhh, using chrome. His inline markup didn't show up. I just checked it in firefox and I see his inline markup. You are correct :) Any reason why it didn't show up in chrome?
  • jigfox
    jigfox almost 14 years
    this will remove all divs and not only the specified one.
  • rockstardev
    rockstardev almost 14 years
    You don't specify anywhere that it must remove the div with the ID=myDiv?
  • Sisir
    Sisir over 12 years
    what i have to modify if i want to remove all div tag in a dom?
  • Gordon
    Gordon over 12 years
  • azzy81
    azzy81 about 12 years
    yes this works a treat. Ive always wante dto be able to remove an html tag form a string of html much like a jquery $(selector#id).remove(). This is just brilliant!
  • Gordon
    Gordon about 12 years
    @SubstanceD if you want selectors check out phpQuery, Zend_Dom or QueryPath. Personally, I prefer XPath.
  • avolquez
    avolquez over 11 years
    @HaimEvgi Is there any way to remove the content inner? for example using p tags it'll be removed, but the content of the p tags remains.
  • Alex Spencer
    Alex Spencer over 11 years
    strip_tags() worked best for me. Thanks. The reason it worked best for me is because i had tags that had no spaces. It was the easiest by far. thanks.
  • hakazvaka
    hakazvaka about 11 years
    this rocks, but is there anyway to remove the closing </div> tag?
  • Tedi Çela
    Tedi Çela over 7 years
    Here is a simple way to strip specific tags(both open & closing): gist.github.com/tedicela/0b06265eefb8df41cb8256bb3f442916
  • mickmackusa
    mickmackusa over 4 years
    This answer DEFINITELY doesn't do what the OP requires. 16 UVs means that lots of researchers have been misinformed and don't understand the question and/or what this answer does. This answer does far more harm than good. The overarching message should be that developers should use a dom parser to manipulate valid html.
  • mickmackusa
    mickmackusa over 4 years
    Question says: I want to remove everything starting from <div id="anotherDiv"> until its closing <div>. How do I do that? This answer is incorrect.
  • mickmackusa
    mickmackusa over 4 years
    Question says: I want to remove everything starting from <div id="anotherDiv"> until its closing <div>. How do I do that? This answer is incorrect.
  • mickmackusa
    mickmackusa over 4 years
    Regex is DOM-ignorant and is prone to failure. Using a legitimate DOM parsing technique will be more robust, reliable, and scalable. Iterated preg_ calls is going to be inefficient. The m pattern modifier is of no use.
  • mickmackusa
    mickmackusa over 4 years
    Regex is DOM-ignorant and is prone to failure. Using a legitimate DOM parsing technique will be more robust, reliable, and scalable. There is no reason to declare $html (a single-use variable); just return preg_replace(...); This snippet will fail when a tag attribute value contains >. There is no need to use a capture group.
  • mickmackusa
    mickmackusa over 4 years
    Question says: I want to remove everything starting from <div id="anotherDiv"> until its closing <div>. How do I do that? This answer is incorrect.
  • mickmackusa
    mickmackusa over 4 years
    This answer does not target the tag using the id as stated in the question. This answer is incorrect because it with remove elements that should not be removed.
  • mickmackusa
    mickmackusa over 4 years
    This answer does not target the tag using the id as stated in the question. This answer is incorrect because it with remove elements that should not be removed.
  • mickmackusa
    mickmackusa over 4 years
    This answer does not target the tag using the id as stated in the question. This answer is incorrect because it with remove elements that should not be removed.
  • user229044
    user229044 over 4 years
    This is incorrect and fails for <div id="someid" data-foo=">">. You cannot use a regex for this.
  • user229044
    user229044 over 4 years
    This is incorrect and fails for many kinds of input, for example strip_tags_blacklist('<script>let x = "<div>"></script><div>foo</div>', 'div') => <script>let x = ". You cannot use regular expressions to strip or otherwise interact with HTML.