Get url from a text

16,752

Solution 1

Try this regex, returns the query string also

(http|ftp|https)://([\w+?\.\w+])+([a-zA-Z0-9\~\!\@\#\$\%\^\&\*\(\)_\-\=\+\\\/\?\.\:\;\'\,]*)?

You can test it on gskinner

Solution 2

public List<string> GetLinks(string message)
{
    List<string> list = new List<string>();
    Regex urlRx = new Regex(@"((https?|ftp|file)\://|www.)[A-Za-z0-9\.\-]+(/[A-Za-z0-9\?\&\=;\+!'\(\)\*\-\._~%]*)*", RegexOptions.IgnoreCase);

    MatchCollection matches = urlRx.Matches(message);
    foreach (Match match in matches)
    {
        list.Add(match.Value);
    }
    return list;
}

var list = GetLinks("Hey yo check this: http://www.google.com/?q=stackoverflow and this: http://www.mysite.com/?id=10&author=me");

It will find the following type of links:

http:// ...
https:// ...
file:// ...
www. ...

Solution 3

If you are using this urls later on your code (extracting a part, querystring or etc.) please consider using

Uri class combine with HttpUtility helper.

Uri uri;
String strUrl = "http://www.test.com/test.aspx?id=53";
bool isUri = Uri.TryCreate(strUrl, UriKind.RelativeOrAbsolute, out uri);
if(isUri){
    Console.WriteLine(uri.PathAndQuery.ToString());
}else{
    Console.WriteLine("invalid");
}

It could help you with this operations.

Share:
16,752
PrateekSaluja
Author by

PrateekSaluja

ResilienceSoft is a global business advisory firm that provides multidisciplinary solutions to complex challenges and opportunities. With the full power of unique depth of thought combined with the global expertise of leading professionals, we are committed to protecting and enhancing the enterprise value of our clients.

Updated on July 28, 2022

Comments

  • PrateekSaluja
    PrateekSaluja almost 2 years

    Possible Duplicate:
    regex for URL including query string

    I have a text or message.

    Hey! try this http://www.test.com/test.aspx?id=53

    Our requirement is to get link from a text.We are using following code

    List<string> list = new List<string>();
    Regex urlRx = new
    Regex(@"(?<url>(http:|https:[/][/]|www.)([a-z]|[A-Z]|[0-9]|[/.]|[~])*)",
    RegexOptions.IgnoreCase);
    
    MatchCollection matches = urlRx.Matches(message);
    foreach (Match match in matches)
    {
       list.Add(match.Value);
    }
    return list;
    

    It gives url but not the complete one.Output of the code is

    http://www.test.com/test.aspx

    But we need complete url like

    http://www.test.com/test.aspx?id=53

    Please suggest how to resolve that issue.Thanks in advance.

  • Sam Greenhalgh
    Sam Greenhalgh over 12 years
    Seems a little overly explicit. Wouldn't (ftp|https?)://[^\s]+ work?
  • Amar Palsapure
    Amar Palsapure over 12 years
    +1 @zapthedingbat This will also work