How to strip a tag and all of its inner html using the tag's id?
Solution 1
With native DOM
$dom = new DOMDocument;
$dom->loadHTML($htmlString);
$xPath = new DOMXPath($dom);
$nodes = $xPath->query('//*[@id="anotherDiv"]');
if($nodes->item(0)) {
$nodes->item(0)->parentNode->removeChild($nodes->item(0));
}
echo $dom->saveHTML();
Solution 2
You can use preg_replace()
like:
$string = preg_replace('/<div id="someid"[^>]+\>/i', "", $string);
Solution 3
Using the native XML Manipulation Library
Assuming that your html content is stored in the variable $html:
$html='<html>
<body>
bla bla bla bla
<div id="myDiv">
more text
<div id="anotherDiv">
And even more text
</div>
</div>
bla bla bla
</body>
</html>';
To delete the tag by ID use the following code:
$dom=new DOMDocument;
$dom->validateOnParse = false;
$dom->loadHTML( $html );
// get the tag
$div = $dom->getElementById('anotherDiv');
// delete the tag
if( $div && $div->nodeType==XML_ELEMENT_NODE ){
$div->parentNode->removeChild( $div );
}
echo $dom->saveHTML();
Note that certain versions of libxml
require a doctype
to be present in order to use the getElementById
method.
In that case you can prepend $html with <!doctype>
$html = '<!doctype>' . $html;
Alternatively, as suggested by Gordon's answer, you can use DOMXPath
to find the element using the xpath:
$dom=new DOMDocument;
$dom->validateOnParse = false;
$dom->loadHTML( $html );
$xp=new DOMXPath( $dom );
$col = $xp->query( '//div[ @id="anotherDiv" ]' );
if( !empty( $col ) ){
foreach( $col as $node ){
$node->parentNode->removeChild( $node );
}
}
echo $dom->saveHTML();
The first method works regardless the tag. If you want to use the second method with the same id but a different tag, let say form
, simply replace //div
in //div[ @id="anotherDiv" ]
by '//form
'
rockstardev
Updated on November 29, 2020Comments
-
rockstardev over 3 years
I have the following html:
<html> <body> bla bla bla bla <div id="myDiv"> more text <div id="anotherDiv"> And even more text </div> </div> bla bla bla </body> </html>
I want to remove everything starting from
<div id="anotherDiv">
until its closing<div>
. How do I do that? -
Haim Evgi almost 14 yearstrip_tags() doesn’t work the way he want it to. strip_tags() allows for certain exclusions, but why would you use that when you only want to exclude one tag and include all other tags
-
ItsPronounced almost 14 yearsFrom his question, I couldn't really tell what tags he was trying to remove. It seemed as if he wanted to remove everything. Thanks for the input.
-
ItsPronounced almost 14 yearsAhhh, using chrome. His inline markup didn't show up. I just checked it in firefox and I see his inline markup. You are correct :) Any reason why it didn't show up in chrome?
-
jigfox almost 14 yearsthis will remove all
div
s and not only the specified one. -
rockstardev almost 14 yearsYou don't specify anywhere that it must remove the div with the ID=myDiv?
-
Sisir over 12 yearswhat i have to modify if i want to remove all div tag in a dom?
-
Gordon over 12 years@Sisir see stackoverflow.com/questions/4177376/…
-
azzy81 about 12 yearsyes this works a treat. Ive always wante dto be able to remove an html tag form a string of html much like a jquery $(selector#id).remove(). This is just brilliant!
-
Gordon about 12 years@SubstanceD if you want selectors check out phpQuery, Zend_Dom or QueryPath. Personally, I prefer XPath.
-
avolquez over 11 years@HaimEvgi Is there any way to remove the content inner? for example using p tags it'll be removed, but the content of the p tags remains.
-
Alex Spencer over 11 yearsstrip_tags() worked best for me. Thanks. The reason it worked best for me is because i had tags that had no spaces. It was the easiest by far. thanks.
-
hakazvaka about 11 yearsthis rocks, but is there anyway to remove the closing </div> tag?
-
Tedi Çela over 7 yearsHere is a simple way to strip specific tags(both open & closing): gist.github.com/tedicela/0b06265eefb8df41cb8256bb3f442916
-
mickmackusa over 4 yearsThis answer DEFINITELY doesn't do what the OP requires. 16 UVs means that lots of researchers have been misinformed and don't understand the question and/or what this answer does. This answer does far more harm than good. The overarching message should be that developers should use a dom parser to manipulate valid html.
-
mickmackusa over 4 yearsQuestion says: I want to remove everything starting from <div id="anotherDiv"> until its closing <div>. How do I do that? This answer is incorrect.
-
mickmackusa over 4 yearsQuestion says: I want to remove everything starting from <div id="anotherDiv"> until its closing <div>. How do I do that? This answer is incorrect.
-
mickmackusa over 4 yearsRegex is DOM-ignorant and is prone to failure. Using a legitimate DOM parsing technique will be more robust, reliable, and scalable. Iterated
preg_
calls is going to be inefficient. Them
pattern modifier is of no use. -
mickmackusa over 4 yearsRegex is DOM-ignorant and is prone to failure. Using a legitimate DOM parsing technique will be more robust, reliable, and scalable. There is no reason to declare
$html
(a single-use variable); justreturn preg_replace(...);
This snippet will fail when a tag attribute value contains>
. There is no need to use a capture group. -
mickmackusa over 4 yearsQuestion says: I want to remove everything starting from <div id="anotherDiv"> until its closing <div>. How do I do that? This answer is incorrect.
-
mickmackusa over 4 yearsThis answer does not target the tag using the
id
as stated in the question. This answer is incorrect because it with remove elements that should not be removed. -
mickmackusa over 4 yearsThis answer does not target the tag using the
id
as stated in the question. This answer is incorrect because it with remove elements that should not be removed. -
mickmackusa over 4 yearsThis answer does not target the tag using the
id
as stated in the question. This answer is incorrect because it with remove elements that should not be removed. -
user229044 over 4 yearsThis is incorrect and fails for
<div id="someid" data-foo=">">
. You cannot use a regex for this. -
user229044 over 4 yearsThis is incorrect and fails for many kinds of input, for example
strip_tags_blacklist('<script>let x = "<div>"></script><div>foo</div>', 'div')
=><script>let x = "
. You cannot use regular expressions to strip or otherwise interact with HTML.