using sed to replace pattern with hash values
Solution 1
In your attempt the command substitution ($(…)
) is performed before sed
being executed and the string passed to it as parameter.
Use a scripting language which regular expression substitution supports code execution:
perl -MDigest::SHA=sha1_hex -pe 's/[A-Z][0-9]{2}[A-Z]/sha1_hex$&/ge' inputfile
php -R 'echo preg_replace("/[A-Z][0-9]{2}[A-Z]/e","sha1(\$0)",$argn),"\n";' inputfile
ruby -rdigest/sha1 -pe '$_.gsub!(/[A-Z][0-9]{2}[A-Z]/){Digest::SHA1.hexdigest$&}' inputfile
python -c 'import sys,fileinput,re,hashlib;[sys.stdout.write(re.sub("[A-Z][0-9]{2}[A-Z]",lambda s:hashlib.sha1(s.group(0)).hexdigest(),l))for l in fileinput.input()]' inputfile
Solution 2
@manatowork has surely provided with answer. Only adding this as a curiosity ...
A bash+sha1sum variant.
function fail()
{
printf "Failed on line \`%s'\n" "$line" >&2
exit 2
}
declare -A sha_map;
re='[A-Z][0-9]{2}[A-Z]';
while read -r line; do
while [[ $line =~ $re ]]; do
m="${BASH_REMATCH[0]}";
if ! [[ ${sha_map[$m]} ]]; then
sha="$(printf "%s" "$m" | sha1sum)" || fail;
sha_map["$m"]=${sha%% *};
fi
line=${line//$m/${sha_map[$m]}};
done
printf "%s\n" "$line";
done <"$fn"
Related videos on Youtube
bjschoenfeld
Updated on September 18, 2022Comments
-
bjschoenfeld over 1 year
I want to search the file and replace specific pattern with its hash (SHA1) values.
For example, let
file.txt
has the following content:one S56G one two three
four five V67X sixand I want to replace the pattern
[A-Z][0-9]\{2\}[A-Z]
with SHA1 value of the match. In the example above, the matches areS56G
andV67X
.Using
sed
, I tried:sed "s/[A-Z][0-9]\{2\}[A-Z]/$(echo \& | sha1sum)/g"
without success, as the result is always the hash value of
'&'
.I also tried
ge
flag, with the command:sed 's/[A-Z][0-9]\{2\}[A-Z]/echo & | sha1sum/ge'
which throws errors:
sh: 1: one: not found
sha1sum: one: No such file or directory
sha1sum: two: No such file or directory
sha1sum: three: No such file or directory -
manatwork over 10 yearsNote that the length of
python
code is not necessarily due to the language's verbosity. Just my knowledge is weaker. -
manatwork over 10 yearsI see a fast handed member downvoted you without explanation. It could be because 1) possible
openssl
version difference – myopenssl
1.0.1e provides messy output; 2) your checksums are calculated including trailing newline, like “S56G\n” – useecho -n $i|sha1sum
instead. -
Boris Brodski over 10 years@manatwork Thank you! Nice catch :-) I have improved my answer getting ride of openssl and using
echo -n
. -
manatwork over 10 yearsYour approach has the advantage to easily reduce the amount of hash calculations in case the replaceable fragments are repeated. (So in “a S56G b S56G c S56G d” calculate SHA1 once and replace all 3 occurrences.) Just pipe
grep
's output throughsort -u
to avoid performing the same calculation again and again. (sed
will replace them all on the first call anyway.) -
bjschoenfeld over 10 yearsI still don't understand why my first try (using the 'g' tag) always returns the hash value of '&'. Without the pipe to 'sha1sum', the 'echo' command returns the correct match. Does 'sed' not support redirection?
-
bjschoenfeld over 10 yearsI'm quite reluctant writing a script that reads the file line by line. It seems that 'sed' and 'perl', being well designed stream editor, would be able to read the file much more efficiently than a user-written bash script does.
-
Runium over 10 years@user1734905: Indeed. As said, only as a curiosity. And, that said, the approach could be used in a script where the volume is small, and the primary goal (of the script) is another.
-
manatwork over 10 years@user1734905, see if this is more clear: pastebin.com/fvcHTyKV
-
Boris Brodski over 10 yearsI thought about adding
sort -u
, but then I decided against it. Don't wanted to make it to complicate. The advantage of my approach is, that it's an example of bash scripting: pipe everything together, if performance doesn't matter.