How to convert NSString HTML markup to plain text NSString?

21,688

Solution 1

You can do it by parsing the html by using NSScanner class

- (NSString *)flattenHTML:(NSString *)html {

    NSScanner *theScanner;
    NSString *text = nil;
    theScanner = [NSScanner scannerWithString:html];

    while ([theScanner isAtEnd] == NO) {

        [theScanner scanUpToString:@"<" intoString:NULL] ; 

        [theScanner scanUpToString:@">" intoString:&text] ;

        html = [html stringByReplacingOccurrencesOfString:[NSString stringWithFormat:@"%@>", text] withString:@""];
    }
    //
    html = [html stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

    return html;
}

Hope this helps.

Solution 2

If you are using UIWebView then it will be easier to parse HTML to text:

fullArticle = [webView stringByEvaluatingJavaScriptFromString:@"document.body.getElementsByTagName('article')[0].innerText;"]; // extract the contents by tag

fullArticle = [webView stringByEvaluatingJavaScriptFromString:@"document.body.innerText"]; // extract text inside body part of HTML
Share:
21,688
Frames84
Author by

Frames84

Updated on July 22, 2022

Comments

  • Frames84
    Frames84 almost 2 years

    Been searching the net for an example of how to convert HTML string markup into Plain text.

    I get my information from a feed which contains HTML, I then display this information in a Text View. does the UITextView have a property to convert HTML or do I have to do it in code. I tried:

    NSString *str = [NSString stringWithCString:self.fullText encoding:NSUTF8StringEndcoding];
    

    but doesn't seem to work. Anyone got any ideas?

  • Frames84
    Frames84 about 14 years
    Doesn't deal with single quotes but for everything else works fine.
  • Frames84
    Frames84 about 14 years
    would this method keep the formatting? What I want is to display the formatted HTML in plain text, so keep links, <h1> <p> etc.. how do other app do this?
  • Frames84
    Frames84 about 14 years
    UIWebView display's a webpage inside a app? need a control or method of keeping the html format but not displaying it. my output contains the markup were i want to it keep to style but not show the html.
  • Madhup Singh Yadav
    Madhup Singh Yadav about 14 years
    If you are having single quotes and you don't want to show them just replace there occurrence by blank string
  • chatur
    chatur over 12 years
    Hi @Madhup. please have look at the question -stackoverflow.com/questions/8148291/… and advice.
  • Neil
    Neil over 11 years
    NSXML parser will not parse normal HTML. It fails on HTML only characters.