PHP Preg-Replace more than one underscore

11,019

Solution 1

preg_replace('/[_]+/', '_', $your_string);

Solution 2

The + operator (quantifier) matches multiple instances of the last character (, character class or capture group or back-reference).

$string = preg_replace('/_+/', '_', $string);

This would replace one or more underscores with a single underscore.


Technically more correct to the title of the question then is to only replace two or more:

$string = preg_replace('/__+/', '_', $string);

Or writing the quantifier with braces:

$string = preg_replace('/_{2,}/', '_', $string);

And perhaps then to capture and (back-) reference:

$string = preg_replace('/(_)\1+/', '\1', $string);

Solution 3

Running tests, I found this:

while (strpos($str, '__') !== false) {
    $str = str_replace('__', '_', $str);
}

to be consistently faster than this:

$str = preg_replace('/[_]+/', '_', $str);

I generated the test strings of varying lengths with this:

$chars = array_merge(array_fill(0, 50, '_'), range('a', 'z'));
$str = '';
for ($i = 0; $i < $len; $i++) {  // $len varied from 10 to 1000000
    $str .= $chars[array_rand($chars)];
}
file_put_contents('test_str.txt', $str);

and tested with these scripts (run separately, but on identical strings for each value of $len):

$str = file_get_contents('test_str.txt');
$start = microtime(true);
$str = preg_replace('/[_]+/', '_', $str);
echo microtime(true) - $start;

and:

$str = file_get_contents('test_str.txt');
$start = microtime(true);
while (strpos($str, '__') !== false) {
    $str = str_replace('__', '_', $str);
}
echo microtime(true) - $start;

For shorter strings the str_replace() method was as much as 25% faster than the preg_replace() method. The longer the string, the less the difference, but str_replace() was always faster.

I know some would prefer one method over the other for reasons other than speed, and I'd be glad to read comments regarding the results, testing method, etc.

Solution 4

Actually using /__+/ or /_{2,}/ would be better than /_+/ since a single underscore does not need to be replaced. This will improve the speed of the preg variant.

Solution 5

For anyone attracted to @GZipp's answer for benchmark/microptimization reasons, I think the following post-test loop should execute slightly better than the pre-test while() loop because the strpos() call has been removed.

str_replace() has a reference variable parameter that can be used to break the loop without an extra, iterated function call. Granted it will always attempt to do at least one replacement, and it won't stop until after it has traversed the string with no replacements.

Code: (Demo)

$str = 'one_two__three___four____bye';
do {
    $str = str_replace('__', '_', $str, $count);
} while ($count);

var_export($str);
// 'one_two_three_four_bye'

As for preg_replace(), here are a couple of good options:

echo preg_replace('/_{2,}/', '_', $str);
echo preg_replace('/_\K_+/', '', $str);  // \K forgets the first, remembers the rest

I don't recommend using + because it makes needless replacements (_ to _)

echo preg_replace('/_+/', '_', $str);

There is definitely no benefit to using a character class /[_]+/ or /[_]{2,}/.

The benefit of using preg_replace() is that the string is never traversed more than once. This makes it a very direct and appropriate tool.

Share:
11,019
Samvel Kostanyan
Author by

Samvel Kostanyan

Updated on June 04, 2022

Comments

  • Samvel Kostanyan
    Samvel Kostanyan almost 2 years

    How do I, using preg_replace, replace more than one underscore with just one underscore?

  • gnud
    gnud over 14 years
    No need to define a character class.
  • Peter Lindqvist
    Peter Lindqvist over 14 years
    So true, but i do it anyway. And it would seem i'm not alone.
  • mickmackusa
    mickmackusa over 2 years
    There is not need for the character class here. Some explanation would be a good addition to this answer.