using sed to replace pattern with hash values

7,116

Solution 1

In your attempt the command substitution ($(…)) is performed before sed being executed and the string passed to it as parameter.

Use a scripting language which regular expression substitution supports code execution:

perl -MDigest::SHA=sha1_hex -pe 's/[A-Z][0-9]{2}[A-Z]/sha1_hex$&/ge' inputfile

php -R 'echo preg_replace("/[A-Z][0-9]{2}[A-Z]/e","sha1(\$0)",$argn),"\n";' inputfile

ruby -rdigest/sha1 -pe '$_.gsub!(/[A-Z][0-9]{2}[A-Z]/){Digest::SHA1.hexdigest$&}' inputfile

python -c 'import sys,fileinput,re,hashlib;[sys.stdout.write(re.sub("[A-Z][0-9]{2}[A-Z]",lambda s:hashlib.sha1(s.group(0)).hexdigest(),l))for l in fileinput.input()]' inputfile

Solution 2

@manatowork has surely provided with answer. Only adding this as a curiosity ...

A bash+sha1sum variant.

function fail()
{
    printf "Failed on line \`%s'\n" "$line" >&2
    exit 2
}

declare -A sha_map;
re='[A-Z][0-9]{2}[A-Z]';

while read -r line; do
    while [[ $line =~ $re ]]; do
        m="${BASH_REMATCH[0]}";
        if ! [[ ${sha_map[$m]} ]]; then
            sha="$(printf "%s" "$m" | sha1sum)" || fail;
            sha_map["$m"]=${sha%% *};
        fi
        line=${line//$m/${sha_map[$m]}};
    done
    printf "%s\n" "$line";
done <"$fn"
Share:
7,116

Related videos on Youtube

bjschoenfeld
Author by

bjschoenfeld

Updated on September 18, 2022

Comments

  • bjschoenfeld
    bjschoenfeld over 1 year

    I want to search the file and replace specific pattern with its hash (SHA1) values.

    For example, let file.txt has the following content:

    one S56G one two three
    four five V67X six

    and I want to replace the pattern [A-Z][0-9]\{2\}[A-Z] with SHA1 value of the match. In the example above, the matches are S56G and V67X.

    Using sed, I tried:

    sed "s/[A-Z][0-9]\{2\}[A-Z]/$(echo \& | sha1sum)/g"

    without success, as the result is always the hash value of '&'.

    I also tried ge flag, with the command:

    sed 's/[A-Z][0-9]\{2\}[A-Z]/echo & | sha1sum/ge'

    which throws errors:

    sh: 1: one: not found
    sha1sum: one: No such file or directory
    sha1sum: two: No such file or directory
    sha1sum: three: No such file or directory

  • manatwork
    manatwork over 10 years
    Note that the length of python code is not necessarily due to the language's verbosity. Just my knowledge is weaker.
  • manatwork
    manatwork over 10 years
    I see a fast handed member downvoted you without explanation. It could be because 1) possible openssl version difference – my openssl 1.0.1e provides messy output; 2) your checksums are calculated including trailing newline, like “S56G\n” – use echo -n $i|sha1sum instead.
  • Boris Brodski
    Boris Brodski over 10 years
    @manatwork Thank you! Nice catch :-) I have improved my answer getting ride of openssl and using echo -n.
  • manatwork
    manatwork over 10 years
    Your approach has the advantage to easily reduce the amount of hash calculations in case the replaceable fragments are repeated. (So in “a S56G b S56G c S56G d” calculate SHA1 once and replace all 3 occurrences.) Just pipe grep's output through sort -u to avoid performing the same calculation again and again. (sed will replace them all on the first call anyway.)
  • bjschoenfeld
    bjschoenfeld over 10 years
    I still don't understand why my first try (using the 'g' tag) always returns the hash value of '&'. Without the pipe to 'sha1sum', the 'echo' command returns the correct match. Does 'sed' not support redirection?
  • bjschoenfeld
    bjschoenfeld over 10 years
    I'm quite reluctant writing a script that reads the file line by line. It seems that 'sed' and 'perl', being well designed stream editor, would be able to read the file much more efficiently than a user-written bash script does.
  • Runium
    Runium over 10 years
    @user1734905: Indeed. As said, only as a curiosity. And, that said, the approach could be used in a script where the volume is small, and the primary goal (of the script) is another.
  • manatwork
    manatwork over 10 years
    @user1734905, see if this is more clear: pastebin.com/fvcHTyKV
  • Boris Brodski
    Boris Brodski over 10 years
    I thought about adding sort -u, but then I decided against it. Don't wanted to make it to complicate. The advantage of my approach is, that it's an example of bash scripting: pipe everything together, if performance doesn't matter.