Find everything between two XML tags with RegEx


Solution 1

It is not a good idea to use regex for HTML/XML parsing...

However, if you want to do it anyway, search for regex pattern


and replace it with empty string...

Solution 2

You should be able to match it with: /<primaryAddress>(.+?)<\/primaryAddress>/

The content between the tags will be in the matched group.

Solution 3

It is not good to use this method but if you really want to split it with regex


the verified answer returns the tags but this just return the value between tags.

Solution 4

this can capture most outermost layer pair of tags, even with attribute in side or without end tags


edit: as mentioned in comment above, regex is always not enough to parse xml, trying to modify the regex to fit more situation only makes it longer but still useless

Author by


Updated on July 22, 2022


  • Doz
    Doz almost 2 years

    In RegEx, I want to find the tag and everything between two XML tags, like the following:

        <addressLine>280 Flinders Mall</addressLine>

    I want to find the tag and everything between primaryAddress, and erase that.

    Everything between the primaryAddress tag is a variable, but I want to remove the entire tag and sub-tags whenever I get primaryAddress.

    Anyone have any idea how to do that?

  • Gianluca Ghettini
    Gianluca Ghettini over 11 years
    Just for curiosity's sake: why is not a good idea to use regex for HTML/XML parsing?
  • Ωmega
    Ωmega over 11 years
  • Doz
    Doz over 11 years
    Yeah i just want to find using TextMate, im not doing this in code or anything. But the example you gave me doesnt work. There is a space after <primaryAddress> and before </primaryAdddress>
  • Ωmega
    Ωmega over 11 years
    @Doz - I don't know what syntax uses TextMate. Your question does not mention any specific information and is tagged with regex, so I have posted general regex solution that is working with majority of regex tools and programming languages. If you need further help, I suggest you to post a new question where you will be more specific about your requiremenets...
  • Doz
    Doz over 11 years
    Omega, I just wanted to get generic information on regex, i only said i use textmate in response to people marking down my question because its a bad idea to use RegEx. I know it is a bad idea but i am using it within a different context.
  • Ωmega
    Ωmega over 11 years
    @Doz - So then you got the general information in my answer... Good luck!
  • Seth
    Seth almost 9 years
    Just in case you don't recognize it, *? means match everything up to the first occurence of </primaryAddress> (non-greedy match). This is important if your file has multiple <primaryAddress> elements in it. Thanks, @Ωmega.
  • JMM
    JMM over 8 years
    This worked great for me, but in particular, anyone using this needs to be aware that it can't handle nested tags. IE, if there was a primaryAddress node as one of the descendents of another primaryAddress node. So make sure that's not a possibility in your xml document.
  • Magnilex
    Magnilex over 8 years
    @Ωmega Agreed that regex and xml are not best friend. However, I just replaced 40-50 tags with an empty line through my IDE (IntelliJ IDEA), in about 5 seconds with help from your answer. In these cases, this regex and xml can be useful.
  • Dima Naychuk
    Dima Naychuk almost 7 years
    Great, this also works in case of new line characters inside of tag body. To catch also parametrized tags, e.g. <primaryAddress isValid=True>, I would suggest small update: <primaryAddress.*?>[\\s\\S]*?</primaryAddress>
  • Ωmega
    Ωmega almost 7 years
    @DimaNaychuk - In such case use <primaryAddress[^>]*>[\s\S]*?<\/primaryAddress>
  • Andrii Karaivanskyi
    Andrii Karaivanskyi over 6 years
    Apparently it won't work even for the example in the question. .+ does not match carriage return symbols.
  • doublesharp
    doublesharp over 6 years
    You would use a multi-line flag.
  • Crashalot
    Crashalot almost 4 years
    @Seth thanks for the non-greedy match, tip! why use [\s\S]*? instead of .*??
  • Seth
    Seth almost 4 years
    @Crashalot the dot might not match a newline character. See the regex docs for your platform / language.
  • Crashalot
    Crashalot almost 4 years
    @Seth thanks for the reply! yes just discovered this. :)