Find everything between two XML tags with RegEx
Solution 1
It is not a good idea to use regex for HTML/XML parsing...
However, if you want to do it anyway, search for regex pattern
<primaryAddress>[\s\S]*?<\/primaryAddress>
and replace it with empty string...
Solution 2
You should be able to match it with: /<primaryAddress>(.+?)<\/primaryAddress>/
The content between the tags will be in the matched group.
Solution 3
It is not good to use this method but if you really want to split it with regex
<primaryAddress.*>((.|\n)*?)<\/primaryAddress>
the verified answer returns the tags but this just return the value between tags.
Solution 4
this can capture most outermost layer pair of tags, even with attribute in side or without end tags
(<!--((?!-->).)*-->|<\w*((?!\/<).)*\/>|<(?<tag>\w+)[^>]*>(?>[^<]|(?R))*<\/\k<tag>\s*>)
edit: as mentioned in comment above, regex is always not enough to parse xml, trying to modify the regex to fit more situation only makes it longer but still useless
Doz
Updated on July 22, 2022Comments
-
Doz almost 2 years
In
RegEx
, I want to find the tag and everything between twoXML tags
, like the following:<primaryAddress> <addressLine>280 Flinders Mall</addressLine> <geoCodeGranularity>PROPERTY</geoCodeGranularity> <latitude>-19.261365</latitude> <longitude>146.815585</longitude> <postcode>4810</postcode> <state>QLD</state> <suburb>Townsville</suburb> <type>PHYSICAL</type> </primaryAddress>
I want to find the tag and everything between
primaryAddress
, and erase that.Everything between the
primaryAddress
tag is a variable, but I want to remove the entire tag and sub-tags whenever I getprimaryAddress
.Anyone have any idea how to do that?
-
Gianluca Ghettini over 11 yearsJust for curiosity's sake: why is not a good idea to use regex for HTML/XML parsing?
-
Ωmega over 11 years
-
Doz over 11 yearsYeah i just want to find using TextMate, im not doing this in code or anything. But the example you gave me doesnt work. There is a space after <primaryAddress> and before </primaryAdddress>
-
Ωmega over 11 years@Doz - I don't know what syntax uses TextMate. Your question does not mention any specific information and is tagged with regex, so I have posted general regex solution that is working with majority of regex tools and programming languages. If you need further help, I suggest you to post a new question where you will be more specific about your requiremenets...
-
Doz over 11 yearsOmega, I just wanted to get generic information on regex, i only said i use textmate in response to people marking down my question because its a bad idea to use RegEx. I know it is a bad idea but i am using it within a different context.
-
Ωmega over 11 years@Doz - So then you got the general information in my answer... Good luck!
-
Seth almost 9 yearsJust in case you don't recognize it,
*?
means match everything up to the first occurence of</primaryAddress>
(non-greedy match). This is important if your file has multiple<primaryAddress>
elements in it. Thanks, @Ωmega. -
JMM over 8 yearsThis worked great for me, but in particular, anyone using this needs to be aware that it can't handle nested tags. IE, if there was a primaryAddress node as one of the descendents of another primaryAddress node. So make sure that's not a possibility in your xml document.
-
Magnilex over 8 years@Ωmega Agreed that regex and xml are not best friend. However, I just replaced 40-50 tags with an empty line through my IDE (IntelliJ IDEA), in about 5 seconds with help from your answer. In these cases, this regex and xml can be useful.
-
Dima Naychuk almost 7 yearsGreat, this also works in case of new line characters inside of tag body. To catch also parametrized tags, e.g.
<primaryAddress isValid=True>
, I would suggest small update:<primaryAddress.*?>[\\s\\S]*?</primaryAddress>
-
Ωmega almost 7 years@DimaNaychuk - In such case use
<primaryAddress[^>]*>[\s\S]*?<\/primaryAddress>
-
Andrii Karaivanskyi over 6 yearsApparently it won't work even for the example in the question.
.+
does not match carriage return symbols. -
doublesharp over 6 yearsYou would use a multi-line flag.
-
Crashalot almost 4 years@Seth thanks for the non-greedy match, tip! why use
[\s\S]*?
instead of.*?
? -
Seth almost 4 years@Crashalot the dot might not match a newline character. See the regex docs for your platform / language.
-
Crashalot almost 4 years@Seth thanks for the reply! yes just discovered this. :)