How to remove HTML tag in Java

java html regex

44,481

Solution 1

You should use a HTML parser instead. I like htmlCleaner, because it gives me a pretty printed version of the HTML.

With htmlCleaner you can do:

TagNode root = htmlCleaner.clean( stream );
Object[] found = root.evaluateXPath( "//div[id='something']" );
if( found.length > 0 && found instanceof TagNode ) {
    ((TagNode)found[0]).removeFromTree();
}

Solution 2

There is JSoup which is a java library made for HTML manipulation. Look at the clean() method and the WhiteList object. Easy to use solution!

Solution 3

If you just need to remove tags then you can use this regular expression:

content = content.replaceAll("<[^>]+>", "");

It will remove only tags, but not other HTML stuff. For more complex things you should use parser.

EDIT: To avoid problems with HTML comments you can do the following:

content = content.replaceAll("<!--.*?-->", "").replaceAll("<[^>]+>", "");

Solution 4

No. Regular expressions can not by definition parse HTML.

You could use a regex to s/<[^>]*\>// or something naive like that but it's going to be insufficient, especially if you're interested in removing the contents of tags.

As another poster said, use an actual HTML parser.

Solution 5

You don't need any HTML parser. The below code removes all HTML comments:

htmlString = htmlString.replaceAll("(?s)", "");

View more solutions

44,481

Author by

Ashwin J

Updated on January 08, 2020

Comments

Ashwin J over 4 years

Is there the regular expression that can completely remove a HTML tag? By the way, I'm using Java.

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Related

Regular Expression - letters, apostrophes, full stops, commas and hyphens are allowed

Java: I have a big string of html and need to extract the href="..." text

Creating java regex to get href link

how to decode html codes using Java?

regex to find email address from a String

Regex to strip HTML tags

Best regex to catch XSS (Cross-site Scripting) attack (in Java)?

Regular expression to validate username

Remove HTML tags from a String

pattern in html5 input type url validation