Split camelCase word into words with php preg_match (Regular Expression)

40,176

Solution 1

You can also use preg_match_all as:

preg_match_all('/((?:^|[A-Z])[a-z]+)/',$str,$matches);

Explanation:

(        - Start of capturing parenthesis.
 (?:     - Start of non-capturing parenthesis.
  ^      - Start anchor.
  |      - Alternation.
  [A-Z]  - Any one capital letter.
 )       - End of non-capturing parenthesis.
 [a-z]+  - one ore more lowercase letter.
)        - End of capturing parenthesis.

Solution 2

You can use preg_split as:

$arr = preg_split('/(?=[A-Z])/',$str);

See it

I'm basically splitting the input string just before the uppercase letter. The regex used (?=[A-Z]) matches the point just before a uppercase letter.

Solution 3

I know that this is an old question with an accepted answer, but IMHO there is a better solution:

<?php // test.php Rev:20140412_0800
$ccWord = 'NewNASAModule';
$re = '/(?#! splitCamelCase Rev:20140412)
    # Split camelCase "words". Two global alternatives. Either g1of2:
      (?<=[a-z])      # Position is after a lowercase,
      (?=[A-Z])       # and before an uppercase letter.
    | (?<=[A-Z])      # Or g2of2; Position is after uppercase,
      (?=[A-Z][a-z])  # and before upper-then-lower case.
    /x';
$a = preg_split($re, $ccWord);
$count = count($a);
for ($i = 0; $i < $count; ++$i) {
    printf("Word %d of %d = \"%s\"\n",
        $i + 1, $count, $a[$i]);
}
?>

Note that this regex, (like codaddict's '/(?=[A-Z])/' solution - which works like a charm for well formed camelCase words), matches only a position within the string and consumes no text at all. This solution has the additional benefit that it also works correctly for not-so-well-formed pseudo-camelcase words such as: StartsWithCap and: hasConsecutiveCAPS.

Input:

oneTwoThreeFour
StartsWithCap
hasConsecutiveCAPS
NewNASAModule

Output:

Word 1 of 4 = "one"
Word 2 of 4 = "Two"
Word 3 of 4 = "Three"
Word 4 of 4 = "Four"

Word 1 of 3 = "Starts"
Word 2 of 3 = "With"
Word 3 of 3 = "Cap"

Word 1 of 3 = "has"
Word 2 of 3 = "Consecutive"
Word 3 of 3 = "CAPS"

Word 1 of 3 = "New"
Word 2 of 3 = "NASA"
Word 3 of 3 = "Module"

Edited: 2014-04-12: Modified regex, script and test data to correctly split: "NewNASAModule" case (in response to rr's comment).

Solution 4

While ridgerunner's answer works great, it seems not to work with all-caps substrings that appear in the middle of sentence. I use following and it seems to deal with these just alright:

function splitCamelCase($input)
{
    return preg_split(
        '/(^[^A-Z]+|[A-Z][^A-Z]+)/',
        $input,
        -1, /* no limit for replacement count */
        PREG_SPLIT_NO_EMPTY /*don't return empty elements*/
            | PREG_SPLIT_DELIM_CAPTURE /*don't strip anything from output array*/
    );
}

Some test cases:

assert(splitCamelCase('lowHigh') == ['low', 'High']);
assert(splitCamelCase('WarriorPrincess') == ['Warrior', 'Princess']);
assert(splitCamelCase('SupportSEELE') == ['Support', 'SEELE']);
assert(splitCamelCase('LaunchFLEIAModule') == ['Launch', 'FLEIA', 'Module']);
assert(splitCamelCase('anotherNASATrip') == ['another', 'NASA', 'Trip']);

Solution 5

A functionized version of @ridgerunner's answer.

/**
 * Converts camelCase string to have spaces between each.
 * @param $camelCaseString
 * @return string
 */
function fromCamelCase($camelCaseString) {
        $re = '/(?<=[a-z])(?=[A-Z])/x';
        $a = preg_split($re, $camelCaseString);
        return join($a, " " );
}
Share:
40,176
CodeChap
Author by

CodeChap

Updated on July 05, 2022

Comments

  • CodeChap
    CodeChap almost 2 years

    How would I go about splitting the word:

    oneTwoThreeFour
    

    into an array so that I can get:

    one Two Three Four
    

    with preg_match ?

    I tired this but it just gives the whole word

    $words = preg_match("/[a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/", $string, $matches)`;
    
  • Anil
    Anil almost 11 years
    This is a much better solution, works first time (others added blank values to the array, this one is perfect! Thanks! +1
  • Daniel Rhodes
    Daniel Rhodes almost 11 years
    oops this will probably fail on the CONSECUTIVE CAPS issue
  • Aaron J Lang
    Aaron J Lang over 10 years
    Wouldn't the non-capturing group cause the result to be [one, wo, hree, our]?
  • Eli Gassert
    Eli Gassert about 10 years
    @AaronJLang no, because the outer parentheses capture the WHOLE group, including the sub-group. It's a sub-group that he doesn't want to clutter the $matches collection.
  • rr-
    rr- about 10 years
    There seems to be a problem with strings like NewNASAModule (outputs: [New, NASAModule]; I'd expect [New, NASA, Module])
  • ridgerunner
    ridgerunner about 10 years
    @rr - Yes you are correct. See my other updated answer which splits: NewNASAModule correctly: RegEx to split camelCase or TitleCase (advanced)
  • Zack Morris
    Zack Morris over 8 years
    This failed for me with "TestID" using: "preg_match_all('/((?:^|[A-Z])[a-z]+)/', $key, $matches); die(implode(' ', $matches[0]));" because it doesn't like the CONSECUTIVE CAPS issue. I needed to split case changes with spaces and @blak3r's solution worked for me: stackoverflow.com/a/17122207/539149
  • Maciej Sz
    Maciej Sz over 7 years
    Better solution for strings like HTMLParser that will work: stackoverflow.com/a/6572999/1697320.
  • benjaminhull
    benjaminhull about 7 years
    Nice and lean - always prefer it this way.
  • cartbeforehorse
    cartbeforehorse about 6 years
    As stipulated by @TarranJones (although not articulated too clearly), you don't need the outer-parenthesis. A matching string of '/(?:^|[A-Z])[a-z]+/'would suffice to produce one array (instead of two). This is because preg_match_all() automatically captures all instances of the match, without you having to specifically stipulate it.
  • Kobi
    Kobi about 5 years
    @jbobbins - Thank, updated. ideone expired old examples at some point, so many old examples are still broken.
  • jbobbins
    jbobbins about 5 years
    @Kobi thanks. just so you're aware, I pasted the assertion text from the post by rr- and the ones with multiple caps together don't work. regex101.com/r/kNZfEI/2
  • Onkeltem
    Onkeltem over 4 years
    It doesn't cover cases with digits. For some reason other repliers also ignore this basic fact. E.g. "Css3Transform" or alike