Remove HTML Tags from an NSString on the iPhone

102,901

Solution 1

A quick and "dirty" (removes everything between < and >) solution, works with iOS >= 3.2:

-(NSString *) stringByStrippingHTML {
  NSRange r;
  NSString *s = [[self copy] autorelease];
  while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
    s = [s stringByReplacingCharactersInRange:r withString:@""];
  return s;
}

I have this declared as a category os NSString.

Solution 2

This NSString category uses the NSXMLParser to accurately remove any HTML tags from an NSString. This is a single .m and .h file that can be included into your project easily.

https://gist.github.com/leighmcculloch/1202238

You then strip html by doing the following:

Import the header:

#import "NSString_stripHtml.h"

And then call stripHtml:

NSString* mystring = @"<b>Hello</b> World!!";
NSString* stripped = [mystring stripHtml];
// stripped will be = Hello World!!

This also works with malformed HTML that technically isn't XML.

Solution 3

UITextView *textview= [[UITextView alloc]initWithFrame:CGRectMake(10, 130, 250, 170)];
NSString *str = @"This is <font color='red'>simple</font>";
[textview setValue:str forKey:@"contentToHTMLString"];
textview.textAlignment = NSTextAlignmentLeft;
textview.editable = NO;
textview.font = [UIFont fontWithName:@"vardana" size:20.0];
[UIView addSubview:textview];

work fine for me

Solution 4

You can use like below

-(void)myMethod
 {

 NSString* htmlStr = @"<some>html</string>";
 NSString* strWithoutFormatting = [self stringByStrippingHTML:htmlStr];

 }

 -(NSString *)stringByStrippingHTML:(NSString*)str
 {
   NSRange r;
   while ((r = [str rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location     != NSNotFound)
  {
     str = [str stringByReplacingCharactersInRange:r withString:@""];
 }
  return str;
 }

Solution 5

use this

NSString *myregex = @"<[^>]*>"; //regex to remove any html tag

NSString *htmlString = @"<html>bla bla</html>";
NSString *stringWithoutHTML = [hstmString stringByReplacingOccurrencesOfRegex:myregex withString:@""];

don't forget to include this in your code : #import "RegexKitLite.h" here is the link to download this API : http://regexkit.sourceforge.net/#Downloads

Share:
102,901

Related videos on Youtube

mamakurka
Author by

mamakurka

I have written a ton of software over the years, some of it was pretty awesome, and some of it was not-so-awesome. Most of my experience is with desktop and mobile application development, including a fair amount of low-level stuff. While a lot of my early work as in-house development for various telecommunications and defense organizations, I've also written a fair amount of commercial software; including apps for iOS, Android, Windows, Mac, and Linux. I've also written a lot of research-oriented software that were heavily data-driven. These were mostly web-based applications that were heavily data-driven, using MySQL, Mongo, and Neo4J.

Updated on July 08, 2022

Comments

  • mamakurka
    mamakurka almost 2 years

    There are a couple of different ways to remove HTML tags from an NSString in Cocoa.

    One way is to render the string into an NSAttributedString and then grab the rendered text.

    Another way is to use NSXMLDocument's -objectByApplyingXSLTString method to apply an XSLT transform that does it.

    Unfortunately, the iPhone doesn't support NSAttributedString or NSXMLDocument. There are too many edge cases and malformed HTML documents for me to feel comfortable using regex or NSScanner. Does anyone have a solution to this?

    One suggestion has been to simply look for opening and closing tag characters, this method won't work except for very trivial cases.

    For example these cases (from the Perl Cookbook chapter on the same subject) would break this method:

    <IMG SRC = "foo.gif" ALT = "A > B">
    
    <!-- <A comment> -->
    
    <script>if (a<b && a>c)</script>
    
    <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
    
    • Ben Gottlieb
      Ben Gottlieb over 15 years
      You could add a bit of logic to take quotes and apostrophes into account... CDATA would take a bit more work, but the whole point of HTML is that unknown tags can be ignored by the parser; if you treat ALL tags as unknown, then you should just get raw text.
    • Jake
      Jake over 14 years
      I'd like to comment that a good (but basic) regular expression will definitely not break at your examples. Certainly not if you can guarantee well formed XHTML. I know that you said you can't, but I wonder why ;-)
    • vipintj
      vipintj almost 14 years
      There is Good answer for this question. Flatten HTML using Objective c
    • steipete
      steipete about 13 years
      Unfortunately, using NSScanner is damn slow.
    • mamakurka
      mamakurka over 11 years
      Even more unfortunately, the linked NSScanner example only works for trivial html. It fails for every test case I mentioned in my post.
    • jasonjwwilliams
      jasonjwwilliams about 9 years
      Exactly why doesn't iOS support NSAttributedString for you? developer.apple.com/library/ios/documentation/Cocoa/Referenc‌​e/…
    • mamakurka
      mamakurka about 9 years
      @jasonjwwilliams I wrote this question in 2008. Support for NSAttributedString wasn't added to iOS until 3.2 (aka, the iPad release), which came out in April 2010.
    • jasonjwwilliams
      jasonjwwilliams about 9 years
      @ifalin Apologies, I lost track of the 2008 date of the original post while reading.
    • mamakurka
      mamakurka about 9 years
      @jasonjwwilliams No worries. This is a problem with SO. You have answers to questions which often only apply as a "best practice" within a certain timeframe or API version.
  • mamakurka
    mamakurka over 15 years
    This is the exact set of comments that I linked to in my question as an example of what would not work.
  • DonnaLea
    DonnaLea over 12 years
    Whilst the regular expression (as said by m.kocikowski) is quick and dirty, this is more robust. Example string: @"My test <span font=\"font>name\">html string". This answer returns: My test html string. Regular expression returns: My test name">html string. Whilst this isn't that common, it's just more robust.
  • csaunders
    csaunders over 12 years
    HTML isn't a regular language so you shouldn't be trying to parse/strip it with a regular expression. stackoverflow.com/questions/1732348/…
  • kompozer
    kompozer over 12 years
    For god's sake, don't use Three20 for anything. Most bloated and bad commented framework ever.
  • James
    James about 12 years
    I'm a complete newb at iPhone development, but can I ask how you use this?
  • Roberto
    Roberto about 12 years
    @James To use the method posted in the solution. You have to create a category for NSString. Look up "Objective-C Category" in Google. Then you add that method in the m file, and the prototype in the h file. When that is all set up, to use it all you have to do is have a string object (Example: NSString *myString = ...) and you call that method on your string object (NSString *strippedString = [myString stringByStrippingHTML];).
  • matm
    matm almost 12 years
    +1 Great use for regular expressions, but does not cover lots of cases unfortunately.
  • wod
    wod about 11 years
    This method is useful but , if i need to non-strip some tag such as link <a> who i can update this method to fulfill this
  • Aaron Brager
    Aaron Brager about 11 years
    This code would break on Perl Cookbook examples 1, 3, and 4 in the question.
  • Rick
    Rick almost 11 years
    I am getting a NSString may not respond to stringByStrippingHTML with this.
  • Ashoor
    Ashoor almost 11 years
    @wod then just change the regex to <(?>/?)(?!a).+?> this will remove all tags excluding the opening <a> and closing </a> tags.
  • Nishant
    Nishant over 10 years
    <br> is being replaced by nothing...which is undesirable.
  • Joshua Gross
    Joshua Gross over 10 years
    Except if you have a string like "S&P 500", it will strip everything after the ampersand and just return the string "S".
  • EZFrag
    EZFrag over 10 years
    Quick and dirty indeed.... This function causes a huge memory leak in my application... Well, in its defence, I am using large amounts of data....
  • Carmen
    Carmen over 10 years
    In my App this solution caused performance problems. I switched to a solution with NSScanner instead NSRegularExpressionSearch. Now the performance problems are gone
  • ullstrm
    ullstrm about 10 years
    It is very very very memory and time consuming. Only use this with small amounts of html!
  • KIDdAe
    KIDdAe over 9 years
    I got encoding issue with this solution
  • Pavan Sisode
    Pavan Sisode almost 9 years
    When we have the meta data with HTML tags and wants to apply that tags, that time we should apply the above code to achive the desire output.
  • Adlai Holler
    Adlai Holler almost 9 years
    Great idea, but insanely inefficient code. Instead use -[NSRegularExpression enumerateMatchesInString:] so that you only parse the regex once and don't rescan text you've already scanned.
  • Adlai Holler
    Adlai Holler almost 9 years
    This should be the accepted answer. The current one is ridiculously wasteful.
  • Zeb
    Zeb over 8 years
    Probably the best solution, but it is useless for a UILabel :-(
  • Rahul
    Rahul over 8 years
    When I try this I am getting this error : ` Terminating app due to uncaught exception 'NSRangeException', reason: '-[__NSCFString substringWithRange:]: Range {8587, 53} out of bounds; string length 8300'`
  • Vyachaslav Gerchicov
    Vyachaslav Gerchicov over 7 years
    Man, stringByReplacingOccurrencesOfString u use outside the cycle is percent encoding and should be fixed via a correct way.
  • Krutarth Patel
    Krutarth Patel over 7 years
    this method is removing html tags.but i want to parse html string.what to do
  • Krutarth Patel
    Krutarth Patel over 7 years
    saved my time.nice solution