shortest way to replace characters in a variable

116,807

Solution 1

Let's see. The shortest I can come up with is a tweak of your tr solution:

OUTPUT="$(tr -d "\"\`'" <<<$OUTPUT)"

Other alternatives include the already mentioned variable substitution which can be shorter than shown so far:

OUTPUT="${OUTPUT//[\'\"\`]}"

And sed of course though this is longer in terms of characters:

OUTPUT="$(sed s/[\'\"\`]//g <<<$OUTPUT)"

I'm not sure if you mean shortest in length or in terms of time taken. In terms of length these two are as short as it gets (or as I can get it anyway) when it comes to removing those specific characters. So, which is fastest? I tested by setting the OUTPUT variable to what you had in your example but repeated several dozen times:

$ echo ${#OUTPUT} 
4900

$ time tr -d "\"\`'" <<<$OUTPUT
real    0m0.002s
user    0m0.004s
sys     0m0.000s
$ time sed s/[\'\"\`]//g <<<$OUTPUT
real    0m0.005s
user    0m0.000s
sys     0m0.000s
$ time echo ${OUTPUT//[\'\"\`]}
real    0m0.027s
user    0m0.028s
sys     0m0.000s

As you can see, the tr is clearly the fastest, followed closely by sed. Also, it seems like using echo is actually slightly faster than using <<<:

$ for i in {1..10}; do 
    ( time echo $OUTPUT | tr -d "\"\`'" > /dev/null ) 2>&1
done | grep -oP 'real.*m\K[\d.]+' | awk '{k+=$1;} END{print k/NR}'; 
0.0025
$ for i in {1..10}; do 
    ( time tr -d "\"\`'" <<<$OUTPUT > /dev/null ) 2>&1 
  done | grep -oP 'real.*m\K[\d.]+' | awk '{k+=$1;} END{print k/NR}'; 
0.0029

Since the difference is tiny, I ran the above tests 10 times for each of the two and it turns out that the fastest is indeed the one you had to begin with:

echo $OUTPUT | tr -d "\"\`'" 

However, this changes when you take into account the overhead of assigning to a variable, here, using tr is slightly slower than the simple replacement:

$ for i in {1..10}; do
    ( time OUTPUT=${OUTPUT//[\'\"\`]} ) 2>&1
  done | grep -oP 'real.*m\K[\d.]+' | awk '{k+=$1;} END{print k/NR}'; 
0.0032

$ for i in {1..10}; do
    ( time OUTPUT=$(echo $OUTPUT | tr -d "\"\`'")) 2>&1
  done | grep -oP 'real.*m\K[\d.]+' | awk '{k+=$1;} END{print k/NR}'; 
0.0044

So, in conclusion, when you simply want to view the results, use tr but if you want to reassign to a variable, using the shell's string manipulation features is faster since they avoid the overhead of running a separate subshell.

Solution 2

You could use variable substitution:

$ OUTPUT=a\'b\"c\`d
$ echo "$OUTPUT"
a'b"c`d

Use that syntax: ${parameter//pattern/string} to replace all occurrences of the pattern with the string.

$ echo "${OUTPUT//\'/x}"
axb"c`d
$ echo "${OUTPUT//\"/x}"
a'bxc`d
$ echo "${OUTPUT//\`/x}"
a'b"cxd
$ echo "${OUTPUT//[\'\"\`]/x}"
axbxcxd

Solution 3

In bash or zsh it is:

OUTPUT="${OUTPUT//[\`\"\']/}"

Note that ${VAR//PATTERN/} removes all instances of the pattern. For more information bash parameter expansion

This solution should be fastest for short strings because it doesn't involve running any external programs. However for very long strings the opposite is true -- it is better to use dedicated tool for text operations, for example:

$ OUTPUT="$(cat /usr/src/linux/.config)"

$ time (echo $OUTPUT | OUTPUT="${OUTPUT//set/abc}")
real    0m1.766s
user    0m1.681s
sys     0m0.002s

$ time (echo $OUTPUT | sed s/set/abc/g >/dev/null)
real    0m0.094s
user    0m0.078s
sys     0m0.006s

Solution 4

If, on the off-chance, you're just trying to handle quotes for reuse ion the shell, then you can do this without removing them, and it's dead simple, too:

aq() { sh -c 'for a do
       alias "$((i=$i+1))=$a"
       done; alias' -- "$@"
}

That function shell quotes any arg array you hand it and increments its output per iterable argument.

Here it is with a few args:

aq \
"here's an
ugly one" \
"this one is \$PATHpretty bad, too" \
'this one```****```; totally sucks'

OUTPUT

1='here'"'"'s an
ugly one'
2='this one is $PATHpretty bad, too'
3='this one```****```; totally sucks'

That output is from dash which typically safe-quotes single-quoted output like '"'"'. bash would do '\''.

Replacing a selection of single, non-whitespace, non-null bytes with another single byte can likely be done quickest in any POSIX shell with $IFS and $*.

set -f; IFS=\"\'\`; set -- $var; printf %s "$*"

OUTPUT

"some ""crazy """"""""string ""here

There I just printf it so you can see it, but of course, if I had done:

var="$*"

...rather than the printf command $var's value would be what you see in the output there.

When I set -f I instruct the shell not to glob - in case the string contains characters that could be construed as glob patterns. I do this because the shells parser expands glob patterns after it performs field splitting on variables. globbing can be re-enabled like set +f. In general - in scripts - I find it useful to set my bang like:

#!/usr/bin/sh -f

And then to explicitly enable globbing with set +f on whatever line I might want it.

Field splitting occurs based on the characters in $IFS.

There are two kinds of $IFS values - $IFS whitespace and $IFS non-whitespace. $IFS whitespace (space, tab, newline) delimited fields are specified to elide by sequence to a single field (or none at all if they do not precede something else) - so...

IFS=\ ; var='      '; printf '<%s>' $var
<>

But all others are specified to evaluate to a single field per occurrence - they are not truncated.

IFS=/; var='/////'; printf '<%s>' $var
<><><><><>

All variable expansions are, by default, $IFS delimited data arrays - they split out to separate fields according to $IFS. When you "-quote one you override that array property and evaluate it as a single string.

So when I do...

IFS=\"\'\`; set -- $var

I am setting the shell's argument array to the many $IFS delimited fields generated by $var's expansion. When it is expanded its constituent values for the characters contained in $IFS are lost - they are only field separators now - they are \0NUL.

"$*" - like other double-quoted variable-expansions - also overrides the field-splitting qualities of $IFS. But, in addition, it substitutes the first byte in $IFS for each delimited field in "$@". So because " was the first value in $IFS all subsequent delimiters become " in "$*". And the " needn't be in $IFS when you split it, either. You could alter $IFS after set -- $args to another value entirely and its new first byte would then show up for the field delimiters in "$*". What's more, you can remove all traces of them entirely like:

set -- $var; IFS=; printf %s "$*"

OUTPUT

some crazy string here
Share:
116,807

Related videos on Youtube

rubo77
Author by

rubo77

SCHWUPPS-DI-WUPPS

Updated on September 18, 2022

Comments

  • rubo77
    rubo77 over 1 year

    There are many ways to replace characters in a variable.

    The shortest way I found out is tr so far:

    OUTPUT=a\'b\"c\`d_123and_a_lot_more
    OUTPUT=$(echo "$OUTPUT"|tr -d "'\`\"")
    echo $OUTPUT
    

    Is there a faster way? And is this quoting-safe for quotes like ', " and ` itself?

    • Marek Zakrzewski
      Marek Zakrzewski over 9 years
      I think you can keep using tr. BASH's PE is good but tr is much faster in this case. e.g. echo "$OUTPUT" | tr -dc '[[:alpha:]]' since you want to have alphanumerics only
    • musiphil
      musiphil over 9 years
      Since you're interested in being quoting-savvy: Always quote your variables! echo "$OUTPUT". Or better: printf "%s\n" "$OUTPUT". (What happens when OUTPUT="-n"?)
    • Alex Shroyer
      Alex Shroyer over 9 years
      You might also consider browsing codegolf, especially the bash tips.
  • chaos
    chaos over 9 years
    @rubo77 echo ${OUTPUT//[`\"\']/x} gives axbxcxa
  • terdon
    terdon over 9 years
    In fact, tr is faster. Regexes and globs are expensive, and while there is no external program here, bash will always be slower than something like tr.
  • Mathias Begert
    Mathias Begert over 9 years
    Since the OP is interested in setting the modified value back into OUTPUT, you will have to account for command substitution sub-shell overhead involved in tr and sed solutions
  • terdon
    terdon over 9 years
    @1_CR yes but since that will be the case whichever method he uses, I figured it is irrelevant.
  • Mathias Begert
    Mathias Begert over 9 years
    Not quite, OUTPUT="${OUTPUT//[`\"\']/}" does not involve command substitution
  • terdon
    terdon over 9 years
    @1_CR ah, I see, yes, you're quite right and that does change the result. Thanks, answer edited.
  • gena2x
    gena2x over 9 years
    That highly depends on input data and on the regexp implementation. In your answer you took some specific large data set - but data set may be small. Or different. Also, you measure not time of regexp but time of echo, so i can't be sure if your comparison really fair.
  • terdon
    terdon over 9 years
    Good points. However, you can't make claims about speed without testing. In fact, when assigning to a variable this seems faster but when printing to screen tr wins (see my answer). I agree that it will depend on many factors but that's exactly why you can't tell which one wins without actually testing it.
  • terdon
    terdon over 9 years
    Very nice, +1. I wonder if it is indeed faster. Could you add some timing tests comparing it to the approaches in my answer? I expect yours will be faster but would like to see.
  • mikeserv
    mikeserv over 9 years
    @terdon - that depends on the shell. It is almost definitely faster than tr in any shell, but the difference is iffy in bash for the ${var//$c/$newc/} case. I expect even in that case it will be faster by some margin, but I don't usually worry about that because for this stuff I always use dash - which is faster by orders of magnitude in generally every respect. And so it is hard to compare.
  • mikeserv
    mikeserv over 9 years
    @terdon - I tried. But - even in bash - doing time (IFS=\"\'`; set -- $var; printf %s "$*") and time (var=${var//\'`/\"/}) both result in 0.0000s results for all fields. Am I doing something wrong, do you think? There's supposed to be a backslash before the backquote up there but I dunno how to put a backquote in a comment code field.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 9 years
    The methods that involve a command substitution have the downside of somewhat mangling the string. (You can avoid it but at the expense of making the command significantly more complex.) In particular, command substitution removes trailing newlines.
  • gena2x
    gena2x over 9 years
    It's incorrect to name the expansion "variable expansion". It is called "parameter expansion".
  • slm
    slm over 9 years
    @gena2x - I don't understand what your comment means here?