Java: String.replace(regex, string) to remove content from XML

18,145

OK, apart from the obvious answer (don't parse XML with regex), maybe we can fix this:

String newString = oldString.replaceFirst("(?s)<tagName[^>]*>.*?</tagName>",
                                          "Content Removed");

Explanation:

(?s)             # turn single-line mode on (otherwise '.' won't match '\n')
<tagName         # remove unnecessary (and perhaps erroneous) escapes
[^>]*            # allow optional attributes
>.*?</tagName>   

Are you sure your matching the tag case correctly? Perhaps you also want to add the i flag to the pattern: (?si)

Share:
18,145
TookTheRook
Author by

TookTheRook

"We learn more by looking for the answer to a question and not finding it than we do from learning the answer itself." ~Lloyd Alexander

Updated on June 16, 2022

Comments

  • TookTheRook
    TookTheRook almost 2 years

    Lets say I have an XML in the form of a string. I wish to remove the content between two tags within the XML String, say . I have tried:

    String newString = oldString.replaceFirst("\\<tagName>.*?\\<//tagName>",
                                                                  "Content Removed");
    

    but it does not work. Any pointers as to what am I doing wrong?

  • Sean Patrick Floyd
    Sean Patrick Floyd almost 13 years
    In Java, </tagName>will do nicely without any escapes.
  • Sean Patrick Floyd
    Sean Patrick Floyd almost 13 years
    @Pable yes, but that doesn't use a Java Regex engine, it's flex / flash
  • Sean Patrick Floyd
    Sean Patrick Floyd almost 13 years
    @Pable no, it works, it's just not necessary: "A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct." ( source )
  • Alex Jones
    Alex Jones almost 13 years
    All right so no harm done then. Thanks for the info (and BTW it's Pablo not Pable :) )
  • Sean Patrick Floyd
    Sean Patrick Floyd almost 13 years
    @Pablo Grrr, the same typo twice. I knew it was Pablo all along, but somehow my fingers didn't agree. Sorry!!!
  • TookTheRook
    TookTheRook almost 13 years
    In the end, simply using string.replaceFirst("<tagName>.*</tagName>", "Content Removed"); worked fine, I don't know why I was making it so complicated. Thanks for explaining the regex attributes in Java though, pretty helpful!