Replace specific characters within strings

692,901

Solution 1

With a regular expression and the function gsub():

group <- c("12357e", "12575e", "197e18", "e18947")
group
[1] "12357e" "12575e" "197e18" "e18947"

gsub("e", "", group)
[1] "12357" "12575" "19718" "18947"

What gsub does here is to replace each occurrence of "e" with an empty string "".


See ?regexp or gsub for more help.

Solution 2

Regular expressions are your friends:

R> ## also adds missing ')' and sets column name
R> group<-data.frame(group=c("12357e", "12575e", "197e18", "e18947"))  )
R> group
   group
1 12357e
2 12575e
3 197e18
4 e18947

Now use gsub() with the simplest possible replacement pattern: empty string:

R> group$groupNoE <- gsub("e", "", group$group)
R> group
   group groupNoE
1 12357e    12357
2 12575e    12575
3 197e18    19718
4 e18947    18947
R> 

Solution 3

Summarizing 2 ways to replace strings:

group<-data.frame(group=c("12357e", "12575e", "197e18", "e18947"))

1) Use gsub

group$group.no.e <- gsub("e", "", group$group)

2) Use the stringr package

group$group.no.e <- str_replace_all(group$group, "e", "")

Both will produce the desire output:

   group group.no.e
1 12357e      12357
2 12575e      12575
3 197e18      19718
4 e18947      18947

Solution 4

You do not need to create data frame from vector of strings, if you want to replace some characters in it. Regular expressions is good choice for it as it has been already mentioned by @Andrie and @Dirk Eddelbuettel.

Pay attention, if you want to replace special characters, like dots, you should employ full regular expression syntax, as shown in example below:

ctr_names <- c("Czech.Republic","New.Zealand","Great.Britain")
gsub("[.]", " ", ctr_names)

this will produce

[1] "Czech Republic" "New Zealand"    "Great Britain" 

Solution 5

Use the stringi package:

require(stringi)

group<-data.frame(c("12357e", "12575e", "197e18", "e18947"))
stri_replace_all(group[,1], "", fixed="e")
[1] "12357" "12575" "19718" "18947"
Share:
692,901

Related videos on Youtube

Luke
Author by

Luke

Updated on May 29, 2021

Comments

  • Luke
    Luke almost 3 years

    I would like to remove specific characters from strings within a vector, similar to the Find and Replace feature in Excel.

    Here are the data I start with:

    group <- data.frame(c("12357e", "12575e", "197e18", "e18947")
    

    I start with just the first column; I want to produce the second column by removing the e's:

    group       group.no.e
    12357e      12357
    12575e      12575
    197e18      19718
    e18947      18947
    
  • dickoa
    dickoa almost 12 years
    Also...require(stringr);group$groupNoE <- str_replace(group$group, "e", "")
  • Dirk Eddelbuettel
    Dirk Eddelbuettel almost 12 years
    Well, I could snicker that "Those who do not understand base functions are doomed to replace them". Exactly what does stringr gain here, besides increasing the number of underscores in your source file?
  • dickoa
    dickoa almost 12 years
    "stringr is a set of simple wrappers that make R's string functions more consistent, simpler and easier to use" from the author of the package. So if what you say is true (many underscores to wrap base functions...) there is no reason for this package to exist (disclaimer : I mainly use base regex functions but I know that they can be difficult for new users...)
  • Joshua Ulrich
    Joshua Ulrich almost 12 years
    @dickoa: str_replace wraps sub, so it will only replace the first occurrence of the pattern. You would need to use str_replace_all if you wanted the same behavior as gsub.
  • Rich Scriven
    Rich Scriven about 8 years
    fixed = TRUE would make this faster.
  • glaed
    glaed over 7 years
    @RichScriven could you shortly elaborate why?
  • mm689
    mm689 over 7 years
    fixed=TRUE prevents R from using regular expressions, which allow more flexible pattern matching but take time to compute. If all that's needed is removing a single constant string "e", they aren't necessary.
  • Megatron
    Megatron over 7 years
    At the time you had to read the whole page including comments to learn the syntax for stringr, my preferred method, as it was mostly discussed in comments. This solution quickly presents both options, which is why I offered it. My hope was to help other users filter through much like I had to do when I was new to R. I struggled with gsub before finding stringr because it wasn't mentioned in a highly upvoted answer. Again, the objective is not to collect upvotes but try to help new R users out.
  • David Arenburg
    David Arenburg over 7 years
    If you find information in other answers/comments which you find useful and like to convert to an answer, you could at least provide some attribution to show where did you get the information from / make the answer a Comminuty Wiki instead of just making it as your own.
  • Megatron
    Megatron over 7 years
    Thanks - will keep in mind for next time. Have never made a community wiki before, so didn't know it was an option.
  • Phil_T
    Phil_T over 6 years
    Option 2 works great when applied to a column of data in a data frame, without specifying all the values in the column. Obviously option 1 is a repeat, but option 2 works very well, and deserves an up-vote for the added functionality.
  • Matheus Santana
    Matheus Santana about 6 years
    Would sub("e", "", group) hold the same result?
  • Kamil S Jaron
    Kamil S Jaron almost 6 years
    You can just escape them, but you have to escape as well the escape character because it's in quotes : gsub("\\.", " ", ctr_names)
  • sindri_baldur
    sindri_baldur almost 6 years
    would just replace the first e it finds in each element
  • Martin
    Martin over 2 years
    @Andrie can this approach also be used for item by item removal? The situation I have in mind is to remove the 1st string in vector B (specifies what is to be removed) from the 1st string in vector A (what is getting part of itself removed). And the 2nd string in vector B from the 2nd string in vector A and so on. The assumption is that the vectors are of same length. I was able to perform this only by means of hacky commands. Is there a clean way to do this?
  • Catalyst
    Catalyst almost 2 years
    but the e is still there if we call group again i.e. it's not removing the e from the group dataframe