How to remove email addresses and links from a string in PHP?

19,395

Solution 1

You can use preg_replace to do it.

for emails:

$pattern = "/[^@\s]*@[^@\s]*\.[^@\s]*/";
$replacement = "[removed]";
preg_replace($pattern, $replacement, $string);

for urls:

$pattern = "/[a-zA-Z]*[:\/\/]*[A-Za-z0-9\-_]+\.+[A-Za-z0-9\.\/%&=\?\-_]+/i";
$replacement = "[removed]";
preg_replace($pattern, $replacement, $string);

Resources

PHP manual entry: http://php.net/manual/en/function.preg-replace.php

Credit where credit is due: email regex taken from preg_match manpage, and URL regex taken from: http://www.weberdev.com/get_example-4227.html

Solution 2

Try this:

$patterns = array('<[\w.]+@[\w.]+>', '<\w{3,6}:(?:(?://)|(?:\\\\))[^\s]+>');
$matches = array('[email removed]', '[link removed]');
$newString = preg_replace($patterns, $matches, $stringToBeMatched);

Note: you can pass an array of patterns and matches into preg_replace instead of running it twice.

Solution 3

My answer is a variation of Josiah's /[^@\s]*@[^@\s]*\.[^@\s]*/ for emails, which works fine but also matches any puctuation after the email address itself: demo 1

Adapt the regex as follows /[^@\s]*@[^@\s\.]*\.[^@\s\.,!?]*/ to exclude . , ! and ?: demo 2

Solution 4

The answer I was going to upvote was deleted. It linked to a Linux Journal article Validate an E-Mail Address with PHP, the Right Way that points out what's wrong with almost every email regex anyone proposes.

The range of valid forms of an email address is much broader than most people think.

Solution 5

There are a lot of characters valid in the first local part of the email (see What characters are allowed in an email address?), so these lines would replace all valid email addresses:

<?php
$c='a-zA-Z-_0-9'; // allowed characters in domainpart
$la=preg_quote('!#$%&\'*+-/=?^_`{|}~', "/"); // additional allowed in first localpart
$email="[$c$la][$c$la\.]*[^.]@[$c]+\.[$c]+";
$t = preg_replace("/\b($email)\b/", '[removed]', $t);
// or with a link:
$t = preg_replace("/\b($email)\b/", '<a href="mailto:\1">\1</a>', $t);

# replace urls:
a='A-Za-z0-9\-_';
$t = preg_replace("/[htpsftp]+[:\/\/]+[$a]+\.+[$a\.\/%&;+~=\?#]+/i", '[removed]', $t);

This will cover most valid email addresses, be informed: removing really only all valid email addresses is a bit more complex (see How can I validate an email address using a regular expression?)

Share:
19,395
JEagle
Author by

JEagle

Updated on June 27, 2022

Comments

  • JEagle
    JEagle almost 2 years

    How do I remove all email addresses and links from a string and replace them with "[removed]"