In a bash script, how do I sanitize user input?

40,627

Solution 1

As dj_segfault points out, the shell can do most of this for you. Looks like you'll have to fall back on something external for lower-casing the string, though. For this you have many options, like the perl one-liners above, etc., but I think tr is probably the simplest.

# first, strip underscores
CLEAN=${STRING//_/}
# next, replace spaces with underscores
CLEAN=${CLEAN// /_}
# now, clean out anything that's not alphanumeric or an underscore
CLEAN=${CLEAN//[^a-zA-Z0-9_]/}
# finally, lowercase with TR
CLEAN=`echo -n $CLEAN | tr A-Z a-z`

The order here is somewhat important. We want to get rid of underscores, plus replace spaces with underscores, so we have to be sure to strip underscores first. By waiting to pass things to tr until the end, we know we have only alphanumeric and underscores, and we can be sure we have no spaces, so we don't have to worry about special characters being interpreted by the shell.

Solution 2

Bash can do this all on it's own, thank you very much. If you look at the section of the man page on Parameter Expansion, you'll see that that bash has built-in substitutions, substring, trim, rtrim, etc.

To eliminate all non-alphanumeric characters, do

CLEANSTRING=${STRING//[^a-zA-Z0-9]/}

That's Occam's razor. No need to launch another process.

Solution 3

For Bash >= 4.0:

CLEAN="${STRING//_/}" && \
CLEAN="${CLEAN// /_}" && \
CLEAN="${CLEAN//[^a-zA-Z0-9]/}" && \
CLEAN="${CLEAN,,}"

This is especially useful for creating container names programmatically using docker/podman. However, in this case you'll also want to remove the underscores:

# Sanitize $STRING for a container name
CLEAN="${STRING//[^a-zA-Z0-9]/}" && \
CLEAN="${CLEAN,,}"

Solution 4

You could run it through perl.

export CLEANSTRING=$(perl -e 'print join( q//, map { s/\\s+/_/g; lc } split /[^\\s\\w]+/, \$ENV{STRING} )')

I'm using ksh-style subshell here, I'm not totally sure that it works in bash.

That's the nice thing about shell, is that you can use perl, awk, sed, grep....

Solution 5

Quick and dirty:

STRING=`echo 'dit /ZOU/ een test123' | perl -pe's/ //g;tr/[A-Z]/[a-z]/;s/[^a-zA-Z0-9]//g'`

Share:
40,627

Related videos on Youtube

Devin Reams
Author by

Devin Reams

Duce of all trades. Dabbles in code.

Updated on January 27, 2020

Comments

  • Devin Reams
    Devin Reams over 4 years

    I'm looking for the best way to take a simple input:

    echo -n "Enter a string here: "
    read -e STRING
    

    and clean it up by removing non-alphanumeric characters, lower(case), and replacing spaces with underscores.

    Does order matter? Is tr the best / only way to go about this?

  • Devin Reams
    Devin Reams over 15 years
    Well put, great answer. I was using parameter expansion without even realizing it.
  • Axeman
    Axeman over 15 years
    It is a good answer for a subset of the specifications, but it doesn't change spaces to underscores.
  • JD.
    JD. over 11 years
    Note to reader: If you are having trouble making this work, check your shebang to see if you're calling bash or sh, and how your system interprets 'sh'.
  • toxalot
    toxalot about 10 years
    As of Bash 4, it can do case modification also. lowercase=${CLEAN,,} Bash Hackers Wiki explains parameter expansions in a more human-readable way than man pages.
  • Jon Carter
    Jon Carter almost 9 years
    Nice work. I wasn't previously aware of these shell features. Thanks! I just discovered that zsh allows you to actually nest all of these, so you can do it in one line: echo -n ${${${str//_/}// /_}//[^a-zA-Z0-9_]/} | tr A-Z a-z ..not that I would recommend putting something that incomprehensible in a script. :) (edit: formatting)
  • higuita
    higuita almost 8 years
    if you set the STRING=$(rm /tmp/*), if you echo the $STRING before cleaning, it will execute the sub-shell and remove your /tmp/ content... so you need to sanitize it BEFORE any echo is done
  • Olivier Dulac
    Olivier Dulac almost 3 years
    very nice. It may need also a : LC_ALL=C before all the a-z A-Z invocations to be sure it doesn't leave any weird things (depending on your locale, or someone else's locale, a-z, A-Z, and maybe even 0-9 can mean a lot of different things...)