Replace single backslash in R

49,934

Solution 1

Since there isn't any direct ways to dealing with single backslashes, here's the closest solution to the problem as provided by David Arenburg in the comments section

gsub("[^A-Za-z0-9]", "", str) #remove all besides the alphabets & numbers

Solution 2

One quite universal solution is

gsub("\\\\", "", str)

Thanks to the comment above.

Solution 3

When inputting backslashes from the keyboard, always escape them.

str <-"this\\is\\my\\string"    # note doubled backslashes -> 'this\is\my\string'
gsub("\\", "", str, fixed=TRUE) # ditto

str2 <- "a\\f\\r"               # ditto -> 'a\f\r'
gsub("\\", "", str2, fixed=TRUE)# ditto

Note that if you do

str <- "a\f\r"

then str contains no backslashes. It consists of the 3 characters a, \f (which is not normally printable, except as \f, and \r (same).

And just to head off a possible question. If your data was read from a file, the file doesn't have to have doubled backslashes. For example, if you have a file test.txt containing

a\b\c\d\e\f

and you do

str <- readLines("test.txt")

then str will contain the string a\b\c\d\e\f as you'd expect: 6 letters separated by 5 single backslashes. But you still have to type doubled backslashes if you want to work with it.

str <- gsub("\\", "", str, fixed=TRUE)  # now contains abcdef

From the dput, it looks like what you've got there is UTF-16 encoded text, which probably came from a Windows machine. According to

it encodes glyphs in the Supplementary Multilingual Plane, which is pretty obscure. I'll guess that you need to supply the argument encoding="UTF-16" to readLines when you read in the file.

Solution 4

This might be helpful :)

require(stringi)
stri_escape_unicode("ala\\ma\\kota")
## [1] "ala\\\\ma\\\\kota"
stri_unescape_unicode("ala\\ ma\\ kota")
## [1] "ala ma kota"

Solution 5

As of R 4.0.0, you can now use raw strings to avoid confusion with backlashes, just use the following syntax: r"(your_raw_expression)" (parentheses included):

str<-r"(ud83d\ude21\ud83d\udd2b)" #Equivalent of "ud83d\\ude21\\ud83d\\udd2b"
gsub(r"(\\)", "", str)
# [1] "ud83dude21ud83dudd2b"
Share:
49,934
Tavi
Author by

Tavi

Updated on July 09, 2022

Comments

  • Tavi
    Tavi almost 2 years

    I have a string that looks like:

    str<-"a\f\r"
    

    I'm trying to remove the backslashes but nothing works:

    gsub("\","",str, fixed=TRUE)
    gsub("\\","",str)
    gsub("(\)","",str)
    gsub("([\])","",str)
    

    ...basically all the variations you can imagine. I have even tried the string_replace_all function. ANY HELP??

    I'm using R version 3.1.1; Mac OSX 10.7; the dput for a single string in my vector of strings gives:

    dput(line)
    "ud83d\ude21\ud83d\udd2b"
    

    I imported the file using readLines from a standard .txt file. The content of the file looks something like: got an engineer booked for this afternoon \ud83d\udc4d all now hopefully sorted\ud83d\ude0a I m going to go insane ud83d\ude21\ud83d\udd2b in utf8towcs …

    Thanks.

    • TooTone
      TooTone over 9 years
    • Tavi
      Tavi over 9 years
      @TooTone it is not quite the same problem. I've worked out dealing with double slashes, but I couldn't find any solutions to single slashes at all.
    • David Arenburg
      David Arenburg over 9 years
      @maryam str<-"this\is\my\string" can't be read into R because \i, \m and \s are stored letters in regex, so your example isn't reproducible (and confusing). You could change it to something like str <- "a\f\r"
    • Tavi
      Tavi over 9 years
      @DavidArenburg okay i'll do that
    • Hong Ooi
      Hong Ooi over 9 years
      @DavidArenburg That doesn't contain any backslashes either. You need to escape them if you're inputting them from the keyboard: str <- "a\\f\\r"
    • TooTone
      TooTone over 9 years
      @DavidArenburg I've deleted my answer which addressed the original version of the question.
    • David Arenburg
      David Arenburg over 9 years
      @TooTone, seems like third exactly the same answer was just posted...
    • TooTone
      TooTone over 9 years
      @HongOoi I'm not sure about your edit to the post. I think we should leave it to the OP to clarify whether it's "this\\is\\my\\string" or "a\f\r" as David suggested.
    • Hong Ooi
      Hong Ooi over 9 years
      I've rolled it back to the original.
    • Tavi
      Tavi over 9 years
      @HongOoi its like David suggested "a\f\r" all I want to do is remove the darn single backslashes
    • Hong Ooi
      Hong Ooi over 9 years
      @maryam You have to be careful. If you want to have a string that contains a backslash, you have to type two backslashes, because \ is a special character to the R parser. For example str <- "a\\f\\r" contains the string 'a\f\r' (two single backslashes). Conversely, str <- "a\f\r" contains no backslashes: it has 3 characters, a, \f and \r.
    • Tavi
      Tavi over 9 years
      @HongOoi but I don't want a string that contain backslashes in the first place. I just got a messed up text dataset that I'm trying to clean up and it has all sords of weird characters in it. I take it there is NO way to deal with single backslashes then?
    • Hong Ooi
      Hong Ooi over 9 years
      @maryam See my edited answer.
    • Hong Ooi
      Hong Ooi over 9 years
      It sounds like your problem is not actually backslashes, but mangled text encoding. If R encounters a character it doesn't know how to print, you'll see lots of backslashes over the place, representing hex codes. Post an actual sample of your code and data, using dput.
    • Tavi
      Tavi over 9 years
      @HongOoi alright, but what is dput? sorry
    • Hong Ooi
      Hong Ooi over 9 years
      See ?dput. It prints something that you can cut and paste into the edit window here on StackOverflow, that will show R's internal representation of your data.
    • Tavi
      Tavi over 9 years
      @HongOoi > dput(line) gives "ud83d\ude21\ud83d\udd2b" where line is the name of my string
    • Hong Ooi
      Hong Ooi over 9 years
      Put that into your question, along with the code you used to read the file and the OS you're using. Each \uXXXX represents a Unicode character which R doesn't know how to handle in your locale/encoding.
    • Tavi
      Tavi over 9 years
      @HongOoi thanks, done!
    • Hong Ooi
      Hong Ooi over 9 years
      Can you also post a fragment of the file you're reading, and the code you used to read it with?
    • David Arenburg
      David Arenburg over 9 years
      It seems like you have lots of gebberish there, try gsub("[^A-Za-z0-9]", "", "ud83d\ude21\ud83d\udd2b") maybe?
    • Tavi
      Tavi over 9 years
      @HongOoi there you go. thanks
    • Tavi
      Tavi over 9 years
      @DavidArenburg it works david but only the first part of the string is returned "ud83d"
    • Hong Ooi
      Hong Ooi over 9 years
      utf8towcs... you're not working with funky foreign languages are you?
    • Dason
      Dason over 9 years
      Yeah that looks like an encoding issue to me.
    • Hong Ooi
      Hong Ooi over 9 years
      Can you upload the file somewhere? It looks like whatever editor you're pasting it from has already turned the weird characters into \xxxx form.
    • Tavi
      Tavi over 9 years
      @HongOoi haha noooo its all english :) actually david's idea to try to substitute everything besides the standard alphabets has the potential to work but for now its not doing exactly what its supposed to do
    • Hong Ooi
      Hong Ooi over 9 years
      Right, that's because each \uXXXX is actually one character.
    • Tavi
      Tavi over 9 years
      @HongOoi one character? oh yeah, now I see… I guess the problem is solved then, I'll use gsub("[^A-Za-z0-9]", "", "ud83d\ude21\ud83d\udd2b") thanks a lot Hong do you know how to mark a comment as answer? so perhaps others shall find it useful?
    • Dason
      Dason over 9 years
      Like I said - it's an encoding issue ;)
    • Tavi
      Tavi over 9 years
      @Dason yes it is, glad its over now been on this for the past 6 hours. thanks to everyone that contributed
  • David Arenburg
    David Arenburg over 9 years
    Did you just adopt the question to your answer or am I missing something? She has str<-"a\f\r" string, not str <-"this\\is\\my\\string"
  • Hong Ooi
    Hong Ooi over 9 years
    The original, before your edits, was str <- "this\is\my\string". This isn't something you can enter into R from the keyboard. If it was meant to represent the string "this\is\my\string", you have to escape the backslashes.
  • David Arenburg
    David Arenburg over 9 years
    Actually I didn't edit it to str <- "a\f\r", the OP did. The error was cause not because of the backslashes but because of the \i, \m and \s
  • Hong Ooi
    Hong Ooi over 9 years
    True, but it was on your suggestion.
  • David Arenburg
    David Arenburg over 9 years
    Because the error wasn't caused due to backslashes, but due to stored letters, so her example wasn't reproducible. You have 10K rep, you can see deleted posts, don't you find any similarity between your answer and the other two deleted?
  • Tavi
    Tavi over 9 years
    please clues anyone? I've wasted so much time trying to do this
  • Hong Ooi
    Hong Ooi over 9 years
    @DavidArenburg The deleted answers would have worked fine, there was no reason to delete them.
  • David Arenburg
    David Arenburg over 9 years
    If you adjust the question to the answer (like you did), they would work fine, yes.
  • Hong Ooi
    Hong Ooi over 9 years
    I didn't adjust the question, which is "how to replace a single backslash". All the answers do exactly this; or they did, until people tried to replace a nonexistent backslash.... I can't undelete an answer, but martin and tootone are free to undelete if they wish.
  • Adrian
    Adrian over 7 years
    This is not a good solution, because it removes spaces as well. Better would be gsub("\\\\", "", str)
  • jon
    jon over 4 years
    Has this ever been resolved? @HongOoi here is a reproducible example that your solution does not fix: tst <- "[179,\"3 min\"]" gsub("\\", "", tst, fixed=TRUE)
  • Hong Ooi
    Hong Ooi over 4 years
    @jon once you understand how many backslashes there are in your string, all will become clear
  • Peter Prevos
    Peter Prevos almost 4 years
    Add a space to the excluded characters: gsub("[^A-Za-z0-9 ]", "", str)
  • Carl Witthoft
    Carl Witthoft about 2 years
    Except that this doesn't in fact work: > str <- '\a\f\r' > str [1] "\a\f\r" > gsub("\\\\", "", str) [1] "\a\f\r"
  • Carl Witthoft
    Carl Witthoft about 2 years
    This does not work in R 4.X . There's still a mess with character strings getting interpreted, and "raw" data maniplulations such as the new r"('this\string')"
  • Carl Witthoft
    Carl Witthoft about 2 years
    The only problem here is: what if I have a string mystring <- 'ud83d\ud21' . I can't insert that inside the r"( construct because nothing gets interpreted inside that.
  • Hong Ooi
    Hong Ooi about 2 years
    @CarlWitthoft once you understand how many backslashes there are in your string, all will become clear
  • Carl Witthoft
    Carl Witthoft about 2 years
    @HongOoi I do understand, e.g. if I have this \f has none I can search for \\\\f . But I cannot search purely for "\\\\"because R lumps \f as a single special character when interpreting a character string. What the R-lang needs is, maybe, a new class "literally_characters" so that nothing gets handled by the console interpreter.
  • Hong Ooi
    Hong Ooi about 2 years
    @CarlWitthoft once you understand how many backslashes there are in your string, all will become clear
  • Carl Witthoft
    Carl Witthoft about 2 years
    @HongOoi , Yes, I know that the \f is a single character and that the console does not "believe" there are any backslash characters in my character string. The point is that we shouldn't have to search for "\\\\f" and "\\\\g" and "\\\\h" ad infinitum. So, backing up: how do we handle a character string that we've been given from elsewhere which has single backslashes that we have to modify? There's no guarantee it was read from a text file (thus bypassing the console parser)
  • Hong Ooi
    Hong Ooi about 2 years
    @CarlWitthoft "read from a text file"? What does that have to do with the price of tea in China?
  • Carl Witthoft
    Carl Witthoft about 2 years
    @HongOoi if you try it, you'll see that R's text reader preserves the backslash as a separate character
  • Hong Ooi
    Hong Ooi about 2 years
    @CarlWitthoft and?