shortest way to replace characters in a variable
Solution 1
Let's see. The shortest I can come up with is a tweak of your tr
solution:
OUTPUT="$(tr -d "\"\`'" <<<$OUTPUT)"
Other alternatives include the already mentioned variable substitution which can be shorter than shown so far:
OUTPUT="${OUTPUT//[\'\"\`]}"
And sed
of course though this is longer in terms of characters:
OUTPUT="$(sed s/[\'\"\`]//g <<<$OUTPUT)"
I'm not sure if you mean shortest in length or in terms of time taken. In terms of length these two are as short as it gets (or as I can get it anyway) when it comes to removing those specific characters. So, which is fastest? I tested by setting the OUTPUT
variable to what you had in your example but repeated several dozen times:
$ echo ${#OUTPUT}
4900
$ time tr -d "\"\`'" <<<$OUTPUT
real 0m0.002s
user 0m0.004s
sys 0m0.000s
$ time sed s/[\'\"\`]//g <<<$OUTPUT
real 0m0.005s
user 0m0.000s
sys 0m0.000s
$ time echo ${OUTPUT//[\'\"\`]}
real 0m0.027s
user 0m0.028s
sys 0m0.000s
As you can see, the tr
is clearly the fastest, followed closely by sed
. Also, it seems like using echo
is actually slightly faster than using <<<
:
$ for i in {1..10}; do
( time echo $OUTPUT | tr -d "\"\`'" > /dev/null ) 2>&1
done | grep -oP 'real.*m\K[\d.]+' | awk '{k+=$1;} END{print k/NR}';
0.0025
$ for i in {1..10}; do
( time tr -d "\"\`'" <<<$OUTPUT > /dev/null ) 2>&1
done | grep -oP 'real.*m\K[\d.]+' | awk '{k+=$1;} END{print k/NR}';
0.0029
Since the difference is tiny, I ran the above tests 10 times for each of the two and it turns out that the fastest is indeed the one you had to begin with:
echo $OUTPUT | tr -d "\"\`'"
However, this changes when you take into account the overhead of assigning to a variable, here, using tr
is slightly slower than the simple replacement:
$ for i in {1..10}; do
( time OUTPUT=${OUTPUT//[\'\"\`]} ) 2>&1
done | grep -oP 'real.*m\K[\d.]+' | awk '{k+=$1;} END{print k/NR}';
0.0032
$ for i in {1..10}; do
( time OUTPUT=$(echo $OUTPUT | tr -d "\"\`'")) 2>&1
done | grep -oP 'real.*m\K[\d.]+' | awk '{k+=$1;} END{print k/NR}';
0.0044
So, in conclusion, when you simply want to view the results, use tr
but if you want to reassign to a variable, using the shell's string manipulation features is faster since they avoid the overhead of running a separate subshell.
Solution 2
You could use variable substitution:
$ OUTPUT=a\'b\"c\`d
$ echo "$OUTPUT"
a'b"c`d
Use that syntax: ${parameter//pattern/string}
to replace all occurrences of the pattern with the string.
$ echo "${OUTPUT//\'/x}"
axb"c`d
$ echo "${OUTPUT//\"/x}"
a'bxc`d
$ echo "${OUTPUT//\`/x}"
a'b"cxd
$ echo "${OUTPUT//[\'\"\`]/x}"
axbxcxd
Solution 3
In bash or zsh it is:
OUTPUT="${OUTPUT//[\`\"\']/}"
Note that ${VAR//PATTERN/}
removes all instances of the pattern. For more information bash parameter expansion
This solution should be fastest for short strings because it doesn't involve running any external programs. However for very long strings the opposite is true -- it is better to use dedicated tool for text operations, for example:
$ OUTPUT="$(cat /usr/src/linux/.config)"
$ time (echo $OUTPUT | OUTPUT="${OUTPUT//set/abc}")
real 0m1.766s
user 0m1.681s
sys 0m0.002s
$ time (echo $OUTPUT | sed s/set/abc/g >/dev/null)
real 0m0.094s
user 0m0.078s
sys 0m0.006s
Solution 4
If, on the off-chance, you're just trying to handle quotes for reuse ion the shell, then you can do this without removing them, and it's dead simple, too:
aq() { sh -c 'for a do
alias "$((i=$i+1))=$a"
done; alias' -- "$@"
}
That function shell quotes any arg array you hand it and increments its output per iterable argument.
Here it is with a few args:
aq \
"here's an
ugly one" \
"this one is \$PATHpretty bad, too" \
'this one```****```; totally sucks'
OUTPUT
1='here'"'"'s an
ugly one'
2='this one is $PATHpretty bad, too'
3='this one```****```; totally sucks'
That output is from dash
which typically safe-quotes single-quoted output like '"'"'
. bash
would do '\''
.
Replacing a selection of single, non-whitespace, non-null bytes with another single byte can likely be done quickest in any POSIX shell with $IFS
and $*
.
set -f; IFS=\"\'\`; set -- $var; printf %s "$*"
OUTPUT
"some ""crazy """"""""string ""here
There I just printf
it so you can see it, but of course, if I had done:
var="$*"
...rather than the printf
command $var
's value would be what you see in the output there.
When I set -f
I instruct the shell not to glob - in case the string contains characters that could be construed as glob patterns. I do this because the shells parser expands glob patterns after it performs field splitting on variables. globbing can be re-enabled like set +f
. In general - in scripts - I find it useful to set my bang like:
#!/usr/bin/sh -f
And then to explicitly enable globbing with set +f
on whatever line I might want it.
Field splitting occurs based on the characters in $IFS
.
There are two kinds of $IFS
values - $IFS
whitespace and $IFS
non-whitespace. $IFS
whitespace (space, tab, newline) delimited fields are specified to elide by sequence to a single field (or none at all if they do not precede something else) - so...
IFS=\ ; var=' '; printf '<%s>' $var
<>
But all others are specified to evaluate to a single field per occurrence - they are not truncated.
IFS=/; var='/////'; printf '<%s>' $var
<><><><><>
All variable expansions are, by default, $IFS
delimited data arrays - they split out to separate fields according to $IFS
. When you "
-quote one you override that array property and evaluate it as a single string.
So when I do...
IFS=\"\'\`; set -- $var
I am setting the shell's argument array to the many $IFS
delimited fields generated by $var
's expansion. When it is expanded its constituent values for the characters contained in $IFS
are lost - they are only field separators now - they are \0NUL
.
"$*"
- like other double-quoted variable-expansions - also overrides the field-splitting qualities of $IFS
. But, in addition, it substitutes the first byte in $IFS
for each delimited field in "$@"
. So because "
was the first value in $IFS
all subsequent delimiters become "
in "$*"
. And the "
needn't be in $IFS
when you split it, either. You could alter $IFS
after set -- $args
to another value entirely and its new first byte would then show up for the field delimiters in "$*"
. What's more, you can remove all traces of them entirely like:
set -- $var; IFS=; printf %s "$*"
OUTPUT
some crazy string here
Related videos on Youtube
Comments
-
rubo77 over 1 year
There are many ways to replace characters in a variable.
The shortest way I found out is
tr
so far:OUTPUT=a\'b\"c\`d_123and_a_lot_more OUTPUT=$(echo "$OUTPUT"|tr -d "'\`\"") echo $OUTPUT
Is there a faster way? And is this quoting-safe for quotes like
'
,"
and ` itself?-
Marek Zakrzewski over 9 yearsI think you can keep using
tr
. BASH's PE is good but tr is much faster in this case. e.g.echo "$OUTPUT" | tr -dc '[[:alpha:]]'
since you want to have alphanumerics only -
musiphil over 9 yearsSince you're interested in being quoting-savvy: Always quote your variables!
echo "$OUTPUT"
. Or better:printf "%s\n" "$OUTPUT"
. (What happens whenOUTPUT="-n"
?) -
Alex Shroyer over 9 years
-
-
chaos over 9 years@rubo77
echo ${OUTPUT//[`\"\']/x}
givesaxbxcxa
-
terdon over 9 yearsIn fact,
tr
is faster. Regexes and globs are expensive, and while there is no external program here, bash will always be slower than something liketr
. -
Mathias Begert over 9 yearsSince the OP is interested in setting the modified value back into
OUTPUT
, you will have to account for command substitution sub-shell overhead involved intr
andsed
solutions -
terdon over 9 years@1_CR yes but since that will be the case whichever method he uses, I figured it is irrelevant.
-
Mathias Begert over 9 yearsNot quite,
OUTPUT="${OUTPUT//[`\"\']/}"
does not involve command substitution -
terdon over 9 years@1_CR ah, I see, yes, you're quite right and that does change the result. Thanks, answer edited.
-
gena2x over 9 yearsThat highly depends on input data and on the regexp implementation. In your answer you took some specific large data set - but data set may be small. Or different. Also, you measure not time of regexp but time of echo, so i can't be sure if your comparison really fair.
-
terdon over 9 yearsGood points. However, you can't make claims about speed without testing. In fact, when assigning to a variable this seems faster but when printing to screen
tr
wins (see my answer). I agree that it will depend on many factors but that's exactly why you can't tell which one wins without actually testing it. -
terdon over 9 yearsVery nice, +1. I wonder if it is indeed faster. Could you add some timing tests comparing it to the approaches in my answer? I expect yours will be faster but would like to see.
-
mikeserv over 9 years@terdon - that depends on the shell. It is almost definitely faster than
tr
in any shell, but the difference is iffy inbash
for the${var//$c/$newc/}
case. I expect even in that case it will be faster by some margin, but I don't usually worry about that because for this stuff I always usedash
- which is faster by orders of magnitude in generally every respect. And so it is hard to compare. -
mikeserv over 9 years@terdon - I tried. But - even in
bash
- doingtime (IFS=\"\'`; set -- $var; printf %s "$*")
andtime (var=${var//\'`/\"/})
both result in0.0000s
results for all fields. Am I doing something wrong, do you think? There's supposed to be a backslash before the backquote up there but I dunno how to put a backquote in a comment code field. -
Gilles 'SO- stop being evil' over 9 yearsThe methods that involve a command substitution have the downside of somewhat mangling the string. (You can avoid it but at the expense of making the command significantly more complex.) In particular, command substitution removes trailing newlines.
-
gena2x over 9 yearsIt's incorrect to name the expansion "variable expansion". It is called "parameter expansion".
-
slm over 9 years@gena2x - I don't understand what your comment means here?