Java: String.replace(regex, string) to remove content from XML
18,145
OK, apart from the obvious answer (don't parse XML with regex), maybe we can fix this:
String newString = oldString.replaceFirst("(?s)<tagName[^>]*>.*?</tagName>",
"Content Removed");
Explanation:
(?s) # turn single-line mode on (otherwise '.' won't match '\n')
<tagName # remove unnecessary (and perhaps erroneous) escapes
[^>]* # allow optional attributes
>.*?</tagName>
Are you sure your matching the tag case correctly? Perhaps you also want to add the i
flag to the pattern: (?si)
Author by
TookTheRook
"We learn more by looking for the answer to a question and not finding it than we do from learning the answer itself." ~Lloyd Alexander
Updated on June 16, 2022Comments
-
TookTheRook almost 2 years
Lets say I have an XML in the form of a string. I wish to remove the content between two tags within the XML String, say . I have tried:
String newString = oldString.replaceFirst("\\<tagName>.*?\\<//tagName>", "Content Removed");
but it does not work. Any pointers as to what am I doing wrong?
-
Sean Patrick Floyd almost 13 yearsIn Java,
</tagName>
will do nicely without any escapes. -
Sean Patrick Floyd almost 13 years@Pable yes, but that doesn't use a Java Regex engine, it's flex / flash
-
Sean Patrick Floyd almost 13 years@Pable no, it works, it's just not necessary: "A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct." ( source )
-
Alex Jones almost 13 yearsAll right so no harm done then. Thanks for the info (and BTW it's Pablo not Pable :) )
-
Sean Patrick Floyd almost 13 years@Pablo Grrr, the same typo twice. I knew it was Pablo all along, but somehow my fingers didn't agree. Sorry!!!
-
TookTheRook almost 13 yearsIn the end, simply using string.replaceFirst("<tagName>.*</tagName>", "Content Removed"); worked fine, I don't know why I was making it so complicated. Thanks for explaining the regex attributes in Java though, pretty helpful!