R regex gsub separate letters and numbers

20,149

Solution 1

You need to use capturing parentheses in the regular expression and group references in the replacement. For example:

gsub('([0-9])([[:alpha:]])', '\\1 \\2', 'This is a test 22mg')

There's nothing R-specific here; the R help for regex and gsub should be of some use.

Solution 2

You need backreferencing:

test <- "The sample is 22mg"
> gsub("([0-9])([a-zA-Z])","\\1 \\2",test)
[1] "The sample is 22 mg"

Anything in parentheses gets remembered. Then they're accessed by \1 (for the first entity in parens), \2, etc. The first backslash escapes the backslash's interpretation in R so that it gets passed to the regular expression parser.

Share:
20,149
screechOwl
Author by

screechOwl

https://financenerd.blog/blog/

Updated on July 11, 2022

Comments

  • screechOwl
    screechOwl almost 2 years

    I have a string that's mixed letters and numbers:

    "The sample is 22mg"
    

    I'd like to split strings where a number is immediately followed by letter like this:

    "The sample is 22 mg"
    

    I've tried this:

    gsub('[0-9]+[[aA-zZ]]', '[0-9]+ [[aA-zZ]]', 'This is a test 22mg')
    

    but am not getting the desired results.

    Any suggestions?