Replacing double backslashes with single backslash

22,781

Solution 1

Not sure if you're still looking for a solution to your problem (since you have an accepted answer) but I will still add my answer as a possible solution to the stated problem:

String str = "\\u003c";
Matcher m = Pattern.compile("(?i)\\\\u([\\da-f]{4})").matcher(str);
if (m.find()) {
    String a = String.valueOf((char) Integer.parseInt(m.group(1), 16));
    System.out.printf("Unicode String is: [%s]%n", a);
}

OUTPUT:

Unicode String is: [<]

Here is online demo of the above code

Solution 2

You can use String#replaceAll:

String str = "\\\\u003c";
str= str.replaceAll("\\\\\\\\", "\\\\");
System.out.println(str);

It looks weird because the first argument is a string defining a regular expression, and \ is a special character both in string literals and in regular expressions. To actually put a \ in our search string, we need to escape it (\\) in the literal. But to actually put a \ in the regular expression, we have to escape it at the regular expression level as well. So to literally get \\ in a string, we need write \\\\ in the string literal; and to get two literal \\ to the regular expression engine, we need to escape those as well, so we end up with \\\\\\\\. That is:

String Literal        String                      Meaning to Regex
−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−− −−−−−−−−−−−−−−−−−
\                     Escape the next character   Would depend on next char
\\                    \                           Escape the next character
\\\\                  \\                          Literal \
\\\\\\\\              \\\\                        Literal \\

In the replacement parameter, even though it's not a regex, it still treats \ and $ specially — and so we have to escape them in the replacement as well. So to get one backslash in the replacement, we need four in that string literal.

Solution 3

Regarding the problem of "replacing double backslashes with single backslashes" or, more generally, "replacing a simple string, containing \, with a different simple string, containing \" (which is not entirely the OP problem, but part of it):

Most of the answers in this thread mention replaceAll, which is a wrong tool for the job here. The easier tool is replace, but confusingly, the OP states that replace("\\\\", "\\") doesn't work for him, that's perhaps why all answers focus on replaceAll.

Important note for people with JavaScript background: Note that replace(CharSequence, CharSequence) in Java does replace ALL occurrences of a substring - unlike in JavaScript, where it only replaces the first one!

Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.

On the other hand, replaceAll(String regex, String replacement) -- more docs also here -- is treating both parameters as more than regular strings:

Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string.

(this is because \ and $ can be used as backreferences to the captured regex groups, hence if you want to used them literally, you need to escape them).

In other words, both first and 2nd params of replace and replaceAll behave differently. For replace you need to double the \ in both params (standard escaping of a backslash in a string literal), whereas in replaceAll, you need to quadruple it! (standard string escape + function-specific escape)

To sum up, for simple replacements, one should stick to replace("\\\\", "\\") (it needs only one escaping, not two).

https://ideone.com/ANeMpw

System.out.println("a\\\\b\\\\c");                                 // "a\\b\\c"
System.out.println("a\\\\b\\\\c".replaceAll("\\\\\\\\", "\\\\"));  // "a\b\c"
//System.out.println("a\\\\b\\\\c".replaceAll("\\\\\\\\", "\\"));  // runtime error
System.out.println("a\\\\b\\\\c".replace("\\\\", "\\"));           // "a\b\c"

https://www.ideone.com/Fj4RCO

String str = "\\\\u003c";
System.out.println(str);                                // "\\u003c"
System.out.println(str.replaceAll("\\\\\\\\", "\\\\")); // "\u003c"
System.out.println(str.replace("\\\\", "\\"));          // "\u003c"

Solution 4

Another option, capture one of the two slashes and replace both slashes with the captured group:

public static void main(String args[])
{
    String str = "C:\\\\";
    str= str.replaceAll("(\\\\)\\\\", "$1");

    System.out.println(str);
} 

Solution 5

Try using,

myString.replaceAll("[\\\\]{2}", "\\\\");

Share:
22,781
Vinay thallam
Author by

Vinay thallam

Updated on July 11, 2022

Comments

  • Vinay thallam
    Vinay thallam almost 2 years

    I have a string "\\u003c", which belongs to UTF-8 charset. I am unable to decode it to unicode because of the presence of double backslashes. How do i get "\u003c" from "\\u003c"? I am using java.

    I tried with,

    myString.replace("\\\\", "\\");
    

    but could not achieve what i wanted.

    This is my code,

    String myString = FileUtils.readFileToString(file);
    String a = myString.replace("\\\\", "\\");
    byte[] utf8 = a.getBytes();
    
    // Convert from UTF-8 to Unicode
    a = new String(utf8, "UTF-8");
    System.out.println("Converted string is:"+a);
    

    and content of the file is

    \u003c