Replace / translate characters in a string

34,739

Solution 1

You can create from and to vectors:

from <- c('a','b','c','d','e','f')
to <- c('h','i','j','k','l','m')

and then vectorialize the gsub function:

gsub2 <- function(pattern, replacement, x, ...) {
for(i in 1:length(pattern))
x <- gsub(pattern[i], replacement[i], x, ...)
x
}

Which gives:

> df <- data.frame(var1 = c("aabbcdefg", "aabbcdefg"))
> df$var1 <- gsub2(from, to, df$var1)
> df
       var1
1 hhiijklmg
2 hhiijklmg

Solution 2

You want chartr:

df$var1 <- chartr("abcdef", "hijklm", df$var1)
df
#        var1
# 1 hhiijklmg
# 2 hhiijklmg

Solution 3

If you don't want to use chartr because the substitutions may be more than one character, then another option is to use gsubfn from the gsubfn package (I know this is not gsub, but is an expansion on gsub). Here is one example:

> library(gsubfn)
> tmp <- list(a='apple',b='banana',c='cherry')
> gsubfn('.', tmp, 'a.b.c.d')
[1] "apple.banana.cherry.d"

The replacement can also be a function that would take the match and return the replacement value for that match.

Share:
34,739
jrara
Author by

jrara

Updated on October 04, 2021

Comments

  • jrara
    jrara over 2 years

    I have a data frame with a character column:

    df <- data.frame(var1 = c("aabbcdefg", "aabbcdefg"))
    df
    #        var1
    # 1 aabbcdefg
    # 2 aabbcdefg
    

    I want to replace several different individual characters, e.g. from "a" to "h", from "b" to "i" and so on. Currently I use several calls to gsub:

    df$var1 <- gsub("a", "h", df$var1)
    df$var1 <- gsub("b", "i", df$var1)
    df$var1 <- gsub("c", "j", df$var1)
    df$var1 <- gsub("d", "k", df$var1)
    df$var1 <- gsub("e", "l", df$var1)
    df$var1 <- gsub("f", "m", df$var1)
    df
    #        var1
    # 1 hhiijklmg
    # 2 hhiijklmg
    

    However, I'm sure there are more elegant solutions. Any ideas ho to proceed?

  • vatodorov
    vatodorov over 10 years
    @jrara How should I modify the code to make replacement conditionally? In the following example, I want to replace Mech, Oper and Eng, only when they are shortened, and I don't want to replace them inside the complete words (i.e. not Mech in Mechanical, or Oper in Operations) Here is the example: from <- ("Mech", "Oper", "Eng") to <- ("Mechanical", "Operations", "Engineer") data.frame(var1 = c("Mech", "Mechanical Engineer", "Oper", "Operations"))
  • Huub Hoofs
    Huub Hoofs over 10 years
    Should be a standard function, Great!