Remove HTML Tags from an NSString on the iPhone
Solution 1
A quick and "dirty" (removes everything between < and >) solution, works with iOS >= 3.2:
-(NSString *) stringByStrippingHTML {
NSRange r;
NSString *s = [[self copy] autorelease];
while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
s = [s stringByReplacingCharactersInRange:r withString:@""];
return s;
}
I have this declared as a category os NSString.
Solution 2
This NSString
category uses the NSXMLParser
to accurately remove any HTML
tags from an NSString
. This is a single .m
and .h
file that can be included into your project easily.
https://gist.github.com/leighmcculloch/1202238
You then strip html
by doing the following:
Import the header:
#import "NSString_stripHtml.h"
And then call stripHtml:
NSString* mystring = @"<b>Hello</b> World!!";
NSString* stripped = [mystring stripHtml];
// stripped will be = Hello World!!
This also works with malformed HTML
that technically isn't XML
.
Solution 3
UITextView *textview= [[UITextView alloc]initWithFrame:CGRectMake(10, 130, 250, 170)];
NSString *str = @"This is <font color='red'>simple</font>";
[textview setValue:str forKey:@"contentToHTMLString"];
textview.textAlignment = NSTextAlignmentLeft;
textview.editable = NO;
textview.font = [UIFont fontWithName:@"vardana" size:20.0];
[UIView addSubview:textview];
work fine for me
Solution 4
You can use like below
-(void)myMethod
{
NSString* htmlStr = @"<some>html</string>";
NSString* strWithoutFormatting = [self stringByStrippingHTML:htmlStr];
}
-(NSString *)stringByStrippingHTML:(NSString*)str
{
NSRange r;
while ((r = [str rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
{
str = [str stringByReplacingCharactersInRange:r withString:@""];
}
return str;
}
Solution 5
use this
NSString *myregex = @"<[^>]*>"; //regex to remove any html tag
NSString *htmlString = @"<html>bla bla</html>";
NSString *stringWithoutHTML = [hstmString stringByReplacingOccurrencesOfRegex:myregex withString:@""];
don't forget to include this in your code : #import "RegexKitLite.h" here is the link to download this API : http://regexkit.sourceforge.net/#Downloads
Related videos on Youtube
mamakurka
I have written a ton of software over the years, some of it was pretty awesome, and some of it was not-so-awesome. Most of my experience is with desktop and mobile application development, including a fair amount of low-level stuff. While a lot of my early work as in-house development for various telecommunications and defense organizations, I've also written a fair amount of commercial software; including apps for iOS, Android, Windows, Mac, and Linux. I've also written a lot of research-oriented software that were heavily data-driven. These were mostly web-based applications that were heavily data-driven, using MySQL, Mongo, and Neo4J.
Updated on July 08, 2022Comments
-
mamakurka almost 2 years
There are a couple of different ways to remove
HTML tags
from anNSString
inCocoa
.One way is to render the string into an
NSAttributedString
and then grab the rendered text.Another way is to use
NSXMLDocument's
-objectByApplyingXSLTString
method to apply anXSLT
transform that does it.Unfortunately, the iPhone doesn't support
NSAttributedString
orNSXMLDocument
. There are too many edge cases and malformedHTML
documents for me to feel comfortable using regex orNSScanner
. Does anyone have a solution to this?One suggestion has been to simply look for opening and closing tag characters, this method won't work except for very trivial cases.
For example these cases (from the Perl Cookbook chapter on the same subject) would break this method:
<IMG SRC = "foo.gif" ALT = "A > B"> <!-- <A comment> --> <script>if (a<b && a>c)</script> <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
-
Ben Gottlieb over 15 yearsYou could add a bit of logic to take quotes and apostrophes into account... CDATA would take a bit more work, but the whole point of HTML is that unknown tags can be ignored by the parser; if you treat ALL tags as unknown, then you should just get raw text.
-
Jake over 14 yearsI'd like to comment that a good (but basic) regular expression will definitely not break at your examples. Certainly not if you can guarantee well formed XHTML. I know that you said you can't, but I wonder why ;-)
-
vipintj almost 14 yearsThere is Good answer for this question. Flatten HTML using Objective c
-
steipete about 13 yearsUnfortunately, using NSScanner is damn slow.
-
mamakurka over 11 yearsEven more unfortunately, the linked NSScanner example only works for trivial html. It fails for every test case I mentioned in my post.
-
jasonjwwilliams about 9 yearsExactly why doesn't iOS support NSAttributedString for you? developer.apple.com/library/ios/documentation/Cocoa/Reference/…
-
mamakurka about 9 years@jasonjwwilliams I wrote this question in 2008. Support for NSAttributedString wasn't added to iOS until 3.2 (aka, the iPad release), which came out in April 2010.
-
jasonjwwilliams about 9 years@ifalin Apologies, I lost track of the 2008 date of the original post while reading.
-
mamakurka about 9 years@jasonjwwilliams No worries. This is a problem with SO. You have answers to questions which often only apply as a "best practice" within a certain timeframe or API version.
-
-
mamakurka over 15 yearsThis is the exact set of comments that I linked to in my question as an example of what would not work.
-
DonnaLea over 12 yearsWhilst the regular expression (as said by m.kocikowski) is quick and dirty, this is more robust. Example string: @"My test <span font=\"font>name\">html string". This answer returns: My test html string. Regular expression returns: My test name">html string. Whilst this isn't that common, it's just more robust.
-
csaunders over 12 yearsHTML isn't a regular language so you shouldn't be trying to parse/strip it with a regular expression. stackoverflow.com/questions/1732348/…
-
kompozer over 12 yearsFor god's sake, don't use Three20 for anything. Most bloated and bad commented framework ever.
-
James about 12 yearsI'm a complete newb at iPhone development, but can I ask how you use this?
-
Roberto about 12 years@James To use the method posted in the solution. You have to create a category for NSString. Look up "Objective-C Category" in Google. Then you add that method in the m file, and the prototype in the h file. When that is all set up, to use it all you have to do is have a string object (Example: NSString *myString = ...) and you call that method on your string object (NSString *strippedString = [myString stringByStrippingHTML];).
-
matm almost 12 years+1 Great use for regular expressions, but does not cover lots of cases unfortunately.
-
wod about 11 yearsThis method is useful but , if i need to non-strip some tag such as link <a> who i can update this method to fulfill this
-
Aaron Brager about 11 yearsThis code would break on Perl Cookbook examples 1, 3, and 4 in the question.
-
Rick almost 11 yearsI am getting a NSString may not respond to stringByStrippingHTML with this.
-
Ashoor almost 11 years@wod then just change the regex to
<(?>/?)(?!a).+?>
this will remove all tags excluding the opening <a> and closing </a> tags. -
Nishant over 10 years<br> is being replaced by nothing...which is undesirable.
-
Joshua Gross over 10 yearsExcept if you have a string like "S&P 500", it will strip everything after the ampersand and just return the string "S".
-
EZFrag over 10 yearsQuick and dirty indeed.... This function causes a huge memory leak in my application... Well, in its defence, I am using large amounts of data....
-
Carmen over 10 yearsIn my App this solution caused performance problems. I switched to a solution with NSScanner instead NSRegularExpressionSearch. Now the performance problems are gone
-
ullstrm about 10 yearsIt is very very very memory and time consuming. Only use this with small amounts of html!
-
KIDdAe over 9 yearsI got encoding issue with this solution
-
Pavan Sisode almost 9 yearsWhen we have the meta data with HTML tags and wants to apply that tags, that time we should apply the above code to achive the desire output.
-
Adlai Holler almost 9 yearsGreat idea, but insanely inefficient code. Instead use -[NSRegularExpression enumerateMatchesInString:] so that you only parse the regex once and don't rescan text you've already scanned.
-
Adlai Holler almost 9 yearsThis should be the accepted answer. The current one is ridiculously wasteful.
-
Zeb over 8 yearsProbably the best solution, but it is useless for a UILabel :-(
-
Rahul over 8 yearsWhen I try this I am getting this error : ` Terminating app due to uncaught exception 'NSRangeException', reason: '-[__NSCFString substringWithRange:]: Range {8587, 53} out of bounds; string length 8300'`
-
Vyachaslav Gerchicov over 7 yearsMan,
stringByReplacingOccurrencesOfString
u use outside the cycle is percent encoding and should be fixed via a correct way. -
Krutarth Patel over 7 yearsthis method is removing html tags.but i want to parse html string.what to do
-
Krutarth Patel over 7 yearssaved my time.nice solution