Replace substring of characters with awk and sed
Solution 1
With GNU awk, you can do
gawk -v start=5 -v end=8 '{
mid = substr($0, start, end-start+1)
print substr($0, 1, start-1) gensub(/./, "N", "g", mid) substr($0, end+1)
}' file
Or with perl
perl -spe 'substr($_, $start-1, $end-$start+1) =~ s/./N/g' -- -start=5 -end=8 file
With both solutions, we pass the start and end values to the program with command line options. This makes it easy to alter the values from within a shell script. If you need to make the replacement character N dynamic as well, it should be pretty obvious how.
Solution 2
If you have GNU awk (gawk) you could set FIELDWIDTHS
to split the line into fields based on character positions. This is particularly convenient for your case in gawk version >= 4.2, which supports a "wildcard" trailing fieldwidth. You can then replace characters in the second field using gsub
:
echo ABCDABCDABCD | ./gawk -v i=5 -v n=4 '
BEGIN {FIELDWIDTHS = sprintf("%d %d *", i-1, n)}
{gsub(/./,"N",$2)} 1
' OFS=""
ABCDNNNNABCD
In older versions of gawk, you can simulate the *
by choosing a suitably large maximum size for the trailing field:
echo ABCDABCDABCD | gawk -v i=5 -v n=4 '
BEGIN {FIELDWIDTHS = sprintf("%d %d 65536", i-1, n)}
{gsub(/./,"N",$2)} 1
' OFS=""
ABCDNNNNABCD
See
Capturing Optional Trailing Data
Solution 3
Using sed
To replace characters 5 through 8 with N
:
$ sed -E 's/(.{4}).{4}/\1NNNN/' test
ABCDNNNNABCD
How it works:
(.{4})
captures the first four characters in group 1..{4}
matches the next four characters.\1NNNN
replaces the above with group 1 and fourN
.
Using GNU awk
$ gawk -F "" '{for (i=5; i<=8; i++) $i="N"} 1' OFS="" test
ABCDNNNNABCD
How it works:
-F ""
tells awk to treat each character as a separate field.for (i=5; i<=8; i++) $i="N"
loops over each character from 5 through 8 and changes it toN
.1
tells awk to print the line.
Related videos on Youtube
Paolo Lorenzini
Applying Data Science to genetics of human populations.
Updated on September 18, 2022Comments
-
Paolo Lorenzini over 1 year
I have a file which contains a very long string of characters and I would like to replace a substring of it with Ns. Example:
test
ABCDABCDABCD
I would like to replace a substring of it with all letters N with awk command and sed, all the characters from index 5 to 8, so the total length of letter N is 4.
Output
ABCDNNNNABCD
I tried something like this:
awk '{ v=substr($0,5,4); sed -i "s/$v/N/g";print substr($0,1,4)""v""substr($0,9,12)}' test
however, this command seems to give this output:
ABCDABCDABC
And no substitution was made
I would like to have in the code the number of the index from where to start the substitution, (here, for example, is 5) and the length number of the substitution ( here 4), so I can just modify these numbers in case I want to start in another position and for a different length of substitutions because in reality, I have a string with thousands of letter and I want to replace hundreds of characters so substitution of pattern does not work in my case
-
Angel Todorov almost 5 yearsAwk is not like shell: you can't just put a sed call in there.
-