Finding urls from text string via php and regex?

12,207
$pattern = '#(www\.|https?://)?[a-z0-9]+\.[a-z0-9]{2,4}\S*#i';
preg_match_all($pattern, $str, $matches, PREG_PATTERN_ORDER);
Share:
12,207
Sisir
Author by

Sisir

Graduated at Electronics & Communication Engineering from KUET. Former Lead project developer for Kallzu, A cutting edge web application for Pay Per Call Marketers. Website https://sisir.me I also maintain a small group of developers and we together build from simple to advanced WordPress websites for clients all over the world.

Updated on June 04, 2022

Comments

  • Sisir
    Sisir almost 2 years

    I know the question title looks very repetitive. But some of the solution i did not find here.

    I need to find urls form text string:

    $pattern = '`.*?((http|https)://[\w#$&+,\/:;[email protected]]+)[^\w#$&+,\/:;[email protected]]*?`i';
    
        if (preg_match_all($pattern,$url_string,$matches)) {
            print_r($matches[1]);
        }
    

    using this pattern i was able to find urls with http:// and https:// which is okey. But i have user input where people add url like www.domain.com even domain.com

    So, i need to validate the string first where i can replace www.domain.com domain.com with common protocol http:// before them. Or i need to comeup with more good pattern?

    I am not good with regex and don't know what to do.

    My idea is first finding the urls with http:// and https:// the put them in an array then replace these url with space(" ") in the text string then use other patterns for it. But i am not sure what pattern to use.

    I am using this $url_string = preg_replace($pattern, ' ', $url_string ); but that removes if any www.domain.com or domain.com url between two valid url with http:// or https://

    If you can help that will be great.

    To make things more clear:

    i need a pattern or some other method where i can find all urls in a text sting. the example of url are:

    1. domain.com
    2. www.domain.com
    3. http://www.domain.com
    4. http://domain.com
    5. https://www.domain.com
    6. https://domain.com

    thanks! 5.

  • Sisir
    Sisir about 13 years
    Thanks! almost worked!! Still need to find the pattern domain.com
  • Jonathan Kuhn
    Jonathan Kuhn about 13 years
    @Sisir replace the {1} with a ? to make the http:// or www optional.
  • Shane
    Shane almost 11 years
    This does not work for me. I receive an empty results. $pattern = '#(www\.|https?:\/\/){?}[a-zA-Z0-9]{2,254}\.[a-zA-Z0-9]{2,4}‌​(\S*)#i'; $count = preg_match_all($pattern, 'http://www.Imaurl.com', $matches, PREG_PATTERN_ORDER); And there is no error from preg_last_error()
  • chmac
    chmac over 10 years
    Copying and pasting this into an interactive PHP shell I also get blank results. Also, the {2,254} limit doesn't support domains like t.co which are gaining popularity these days. Tried to edit the answer, but an edit must be >6 characters apparently :-( Oh, and I don't think this will match domains like me-too.com.