Replace specific characters within strings

r regex replace gsub string-substitution

692,901

Solution 1

With a regular expression and the function gsub():

group <- c("12357e", "12575e", "197e18", "e18947")
group
[1] "12357e" "12575e" "197e18" "e18947"

gsub("e", "", group)
[1] "12357" "12575" "19718" "18947"

What gsub does here is to replace each occurrence of "e" with an empty string "".

See ?regexp or gsub for more help.

Solution 2

Regular expressions are your friends:

R> ## also adds missing ')' and sets column name
R> group<-data.frame(group=c("12357e", "12575e", "197e18", "e18947"))  )
R> group
   group
1 12357e
2 12575e
3 197e18
4 e18947

Now use gsub() with the simplest possible replacement pattern: empty string:

R> group$groupNoE <- gsub("e", "", group$group)
R> group
   group groupNoE
1 12357e    12357
2 12575e    12575
3 197e18    19718
4 e18947    18947
R>

Solution 3

Summarizing 2 ways to replace strings:

group<-data.frame(group=c("12357e", "12575e", "197e18", "e18947"))

1) Use gsub

group$group.no.e <- gsub("e", "", group$group)

2) Use the stringr package

group$group.no.e <- str_replace_all(group$group, "e", "")

Both will produce the desire output:

   group group.no.e
1 12357e      12357
2 12575e      12575
3 197e18      19718
4 e18947      18947

Solution 4

You do not need to create data frame from vector of strings, if you want to replace some characters in it. Regular expressions is good choice for it as it has been already mentioned by @Andrie and @Dirk Eddelbuettel.

Pay attention, if you want to replace special characters, like dots, you should employ full regular expression syntax, as shown in example below:

ctr_names <- c("Czech.Republic","New.Zealand","Great.Britain")
gsub("[.]", " ", ctr_names)

this will produce

[1] "Czech Republic" "New Zealand"    "Great Britain"

Solution 5

Use the stringi package:

require(stringi)

group<-data.frame(c("12357e", "12575e", "197e18", "e18947"))
stri_replace_all(group[,1], "", fixed="e")
[1] "12357" "12575" "19718" "18947"

View more solutions

692,901

Luke

Updated on May 29, 2021

Comments

Luke almost 3 years
I would like to remove specific characters from strings within a vector, similar to the Find and Replace feature in Excel.

Here are the data I start with:
```
group <- data.frame(c("12357e", "12575e", "197e18", "e18947")
```
I start with just the first column; I want to produce the second column by removing the e's:
```
group       group.no.e
12357e      12357
12575e      12575
197e18      19718
e18947      18947
```
dickoa almost 12 years

Also...require(stringr);group$groupNoE <- str_replace(group$group, "e", "")
Dirk Eddelbuettel almost 12 years

Well, I could snicker that "Those who do not understand base functions are doomed to replace them". Exactly what does stringr gain here, besides increasing the number of underscores in your source file?
dickoa almost 12 years

"stringr is a set of simple wrappers that make R's string functions more consistent, simpler and easier to use" from the author of the package. So if what you say is true (many underscores to wrap base functions...) there is no reason for this package to exist (disclaimer : I mainly use base regex functions but I know that they can be difficult for new users...)
Joshua Ulrich almost 12 years

@dickoa: str_replace wraps sub, so it will only replace the first occurrence of the pattern. You would need to use str_replace_all if you wanted the same behavior as gsub.
Rich Scriven about 8 years

fixed = TRUE would make this faster.
glaed over 7 years

@RichScriven could you shortly elaborate why?
mm689 over 7 years

fixed=TRUE prevents R from using regular expressions, which allow more flexible pattern matching but take time to compute. If all that's needed is removing a single constant string "e", they aren't necessary.
Megatron over 7 years

At the time you had to read the whole page including comments to learn the syntax for stringr, my preferred method, as it was mostly discussed in comments. This solution quickly presents both options, which is why I offered it. My hope was to help other users filter through much like I had to do when I was new to R. I struggled with gsub before finding stringr because it wasn't mentioned in a highly upvoted answer. Again, the objective is not to collect upvotes but try to help new R users out.
David Arenburg over 7 years

If you find information in other answers/comments which you find useful and like to convert to an answer, you could at least provide some attribution to show where did you get the information from / make the answer a Comminuty Wiki instead of just making it as your own.
Megatron over 7 years

Thanks - will keep in mind for next time. Have never made a community wiki before, so didn't know it was an option.
Phil_T over 6 years

Option 2 works great when applied to a column of data in a data frame, without specifying all the values in the column. Obviously option 1 is a repeat, but option 2 works very well, and deserves an up-vote for the added functionality.
Matheus Santana about 6 years

Would sub("e", "", group) hold the same result?
Kamil S Jaron almost 6 years

You can just escape them, but you have to escape as well the escape character because it's in quotes : gsub("\\.", " ", ctr_names)
sindri_baldur almost 6 years

would just replace the first e it finds in each element
Martin over 2 years

@Andrie can this approach also be used for item by item removal? The situation I have in mind is to remove the 1st string in vector B (specifies what is to be removed) from the 1st string in vector A (what is getting part of itself removed). And the 2nd string in vector B from the 2nd string in vector A and so on. The assumption is that the vectors are of same length. I was able to perform this only by means of hacky commands. Is there a clean way to do this?
Catalyst almost 2 years

but the e is still there if we call group again i.e. it's not removing the e from the group dataframe