Using NSRegularExpression to extract URLs on the iPhone

16,499

Solution 1

The method matchesInString:options:range: returns an array of NSTextCheckingResult objects. You can use fast enumeration to iterate through the array, pull out the substring of each match from your original string, and add the substring to a new array.

NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)?" options:NSRegularExpressionCaseInsensitive error:&error];

NSArray *arrayOfAllMatches = [regex matchesInString:httpLine options:0 range:NSMakeRange(0, [httpLine length])];

NSMutableArray *arrayOfURLs = [[NSMutableArray alloc] init];

for (NSTextCheckingResult *match in arrayOfAllMatches) {    
    NSString* substringForMatch = [httpLine substringWithRange:match.range];
    NSLog(@"Extracted URL: %@",substringForMatch);

    [arrayOfURLs addObject:substringForMatch];
}

// return non-mutable version of the array
return [NSArray arrayWithArray:arrayOfURLs];

Solution 2

Try NSDataDetector

NSDataDetector *linkDetector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink error:nil];
NSArray *matches = [linkDetector matchesInString:text options:0 range:NSMakeRange(0, [text length])];

Solution 3

With NSDataDetector using Swift :

let types: NSTextCheckingType = .Link
var error : NSError?

let detector = NSDataDetector(types: types.rawValue, error: &error)        
var matches = detector!.matchesInString(text, options: nil, range: NSMakeRange(0, count(text)))

for match in matches {
   println(match.URL!)
}

Using Swift 2.0:

let text = "http://www.google.com. http://www.bla.com"
let types: NSTextCheckingType = .Link

let detector = try? NSDataDetector(types: types.rawValue)

guard let detect = detector else {
   return
}

let matches = detect.matchesInString(text, options: .ReportCompletion, range: NSMakeRange(0, text.characters.count))

for match in matches {
   print(match.URL!)
}

Using Swift 3.0

let text = "http://www.google.com. http://www.bla.com"
let types: NSTextCheckingResult.CheckingType = .link

let detector = try? NSDataDetector(types: types.rawValue)

let matches = detector?.matches(in: text, options: .reportCompletion, range: NSMakeRange(0, text.characters.count))

for match in matches! {
   print(match.url!)
}

Solution 4

to get all links from a given string

NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:@"(?i)\\b((?:[a-z][\\w-]+:(?:/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’]))" options:NSRegularExpressionCaseInsensitive error:NULL];
NSString *someString = @"www.facebook.com/link/index.php This is a sample www.google.com of a http://abc.com/efg.php?EFAei687e3EsA sentence with a URL within it.";

NSArray *matches = [expression matchesInString:someString options:NSMatchingCompleted range:NSMakeRange(0, someString.length)];
for (NSTextCheckingResult *result in matches) {
        NSString *url = [someString substringWithRange:result.range];
        NSLog(@"found url:%@", url);
}

Solution 5

I found myself so nauseated by the complexity of this simple operation ("match ALL the substrings") that I made a little library I am humbly calling Unsuck which adds some sanity to NSRegularExpression in the form of from and allMatches methods. Here's how you'd use them:

NSRegularExpression *re = [NSRegularExpression from: @"(?i)\\b(https?://.*)\\b"]; // or whatever your favorite regex is; Hossam's seems pretty good
NSArray *matches = [re allMatches:httpLine];

Please check out the unsuck source code on github and tell me all the things I did wrong :-)

Note that (?i) makes it case insensitive so you don't need to specify NSRegularExpressionCaseInsensitive.

Share:
16,499
neowinston
Author by

neowinston

I'm a passionate iOS Developer.

Updated on June 05, 2022

Comments

  • neowinston
    neowinston almost 2 years

    I'm using the following code on my iPhone app, taken from here to extract all URLs from striped .html code.

    I'm only being able to extract the first URL, but I need an array containing all URLs. My NSArray isn't returning NSStrings for each URL, but the objects descriptions only.

    How do I make my arrayOfAllMatches return all URLs, as NSStrings?

    -(NSArray *)stripOutHttp:(NSString *)httpLine {
    
    // Setup an NSError object to catch any failures
    NSError *error = NULL;  
    
    // create the NSRegularExpression object and initialize it with a pattern
    // the pattern will match any http or https url, with option case insensitive
    
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)?" options:NSRegularExpressionCaseInsensitive error:&error];
    
    // create an NSRange object using our regex object for the first match in the string httpline
    NSRange rangeOfFirstMatch = [regex rangeOfFirstMatchInString:httpLine options:0 range:NSMakeRange(0, [httpLine length])];
    
    NSArray *arrayOfAllMatches = [regex matchesInString:httpLine options:0 range:NSMakeRange(0, [httpLine length])];
    
    // check that our NSRange object is not equal to range of NSNotFound
    if (!NSEqualRanges(rangeOfFirstMatch, NSMakeRange(NSNotFound, 0))) {
        // Since we know that we found a match, get the substring from the parent string by using our NSRange object
    
        NSString *substringForFirstMatch = [httpLine substringWithRange:rangeOfFirstMatch];
    
        NSLog(@"Extracted URL: %@",substringForFirstMatch);
        NSLog(@"All Extracted URLs: %@",arrayOfAllMatches);
    
        // return all matching url strings
        return arrayOfAllMatches;
    }
    
    return NULL;
    

    }

    Here is my NSLog output:

    Extracted URL: http://example.com/myplayer    
    All Extracted URLs: (
        "<NSExtendedRegularExpressionCheckingResult: 0x106ddb0>{728, 53}{<NSRegularExpression: 0x106bc30> http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)? 0x1}",
        "<NSExtendedRegularExpressionCheckingResult: 0x106ddf0>{956, 66}{<NSRegularExpression: 0x106bc30> http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)? 0x1}",
        "<NSExtendedRegularExpressionCheckingResult: 0x106de30>{1046, 63}{<NSRegularExpression: 0x106bc30> http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)? 0x1}",
        "<NSExtendedRegularExpressionCheckingResult: 0x106de70>{1129, 67}{<NSRegularExpression: 0x106bc30> http?://([-\\w\\.]+)+(:\\d+)?(/([\\w/_\\.]*(\\?\\S+)?)?)? 0x1}"
    )