regular expression to remove all non-printable characters

24,901

Solution 1

The following regex will only match printable text

[^\x00\x08\x0B\x0C\x0E-\x1F]*

The following Regex will find non-printable characters

[\x00\x08\x0B\x0C\x0E-\x1F]

Jave Code:

boolean foundMatch = false;
try {
    Pattern regex = Pattern.compile("[\\x00\\x08\\x0B\\x0C\\x0E-\\x1F]");
    Matcher regexMatcher = regex.matcher(subjectString);
    foundMatch = regexMatcher.find();
    //Relace the found text with whatever you want
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

Solution 2

Here I would prefer a simpler solution. BTW you ignored offset and count. The solution below overwrites the original array.

public void write(byte[] bytes, int offset, int count) {
    int writtenI = offset;
    for (int readI = offset; readI < offset + count; ++readI) {
        byte b = bytes[readI];
        if (32 <= b && b < 127) {
            // ASCII printable:
            bytes[writtenI] = bytes[readI]; // writtenI <= readI
            ++writtenI;
        }
    }
    byte[] bytes2 = new byte[writtenI - offset];
    System.arraycopy(bytes, offset, bytes2, 0, writtenI - offset);
    //String str = new String(bytes, offset, writtenI - offset, "ASCII");
    //bytes2 = str.getBytes("ASCII");
    GraphicsTerminalActivity.sendOverSerial(bytes2);
}
Share:
24,901
Paul
Author by

Paul

Updated on December 09, 2020

Comments

  • Paul
    Paul over 3 years

    I wish to remove all non-printable ascii characters from a string while retaining invisible ones. I thought this would work because whitespace, \n \r are invisible characters but not non-printable? Basically I am getting a byte array with � characters in it and I don't want them to be in it. So i am trying to convert it to a string, remove the � characters before using it as a byte array again.

    Space works fine in my code now, however now \r and \n do not work. What would be the correct regex to retain these also? Or is there a better way that what I am doing?

    public void write(byte[] bytes, int offset, int count) {
    
        try {
            String str = new String(bytes, "ASCII");
            str2 = str.replaceAll("[^\\p{Print}\\t\\n]", "");
            GraphicsTerminalActivity.sendOverSerial(str2.getBytes("ASCII"));
    
        } catch (UnsupportedEncodingException e) {
    
            e.printStackTrace();
        }
    
         return;
     }
    
    } 
    

    EDIT: I tried [^\x00-\x7F] which is the range of ascii characters....but then the � symbols still get through, weird.