Remove all special characters and case from string in bash

103,765

Solution 1

cat yourfile.txt | tr -dc '[:alnum:]\n\r' | tr '[:upper:]' '[:lower:]'

The first tr deletes special characters. d means delete, c means complement (invert the character set). So, -dc means delete all characters except those specified. The \n and \r are included to preserve linux or windows style newlines, which I assume you want.

The second one translates uppercase characters to lowercase.

Solution 2

Pure BASH 4+ solution:

$ filename='Some_randoM data1-A'
$ f=${filename//[^[:alnum:]]/}
$ echo "$f"
SomerandoMdata1A
$ echo "${f,,}"
somerandomdata1a

A function for this:

clean() {
    local a=${1//[^[:alnum:]]/}
    echo "${a,,}"
}

Try it:

$ clean "More Data0"
moredata0

Solution 3

if you are using mkelement0 and Dan Bliss approach. You can also look into sed + POSIX regular expression.

cat yourfile.txt | sed 's/[^a-zA-Z0-9]//g'

Sed matches all other characters that are not contained within the brackets except letters and numbers and remove them.

Solution 4

I've used tr to remove any characters that are not part of [:print:] class

cat file.txt | tr -dc '[:print:]'

or

echo "..." | tr -dc '[:print:]'

Additionally you might want to | (pipe) the output to od -c to confirm the result

cat file.txt | tr -dc '[:print:]' | od -c
Share:
103,765

Related videos on Youtube

Questionmark
Author by

Questionmark

Believe me, this actually means something: ________ _jgN########Ngg_ _N##N@@"" ""9NN##Np_ d###P N####p "^^" T#### d###P _g###@F _gN##@P gN###F" d###F 0###F 0###F 0###F "NN@' ___ q###r "" My name is Mark... Get it?

Updated on October 31, 2021

Comments

  • Questionmark
    Questionmark over 2 years

    I am writing a bash script that needs to parse filenames.

    It will need to remove all special characters (including space): "!?.-_ and change all uppercase letters to lowercase. Something like this:

    Some_randoM data1-A
    More Data0
    

    to:

    somerandomdata1a
    moredata0
    

    I have seen lots of questions to do this in many different programming languages, but not in bash. Is there a good way to do this?

  • mklement0
    mklement0 almost 10 years
    +1; the ,, operator (convert entire string to lowercase) requires bash 4+.
  • Arsen
    Arsen about 8 years
    Just if you (just like me) came here looking for a solution the get rid of characters like \r \n or ^C in a variable var2=echo $var | tr -d '[:cntrl:]' is the solution. Thanks to this answer I figured this out.
  • Johnny
    Johnny over 4 years
    Nice answer! You saved my life!
  • Sohail Si
    Sohail Si over 3 years
    od was useful for me