Replace substring of characters with awk and sed

linux text-processing awk sed

10,253

Solution 1

With GNU awk, you can do

gawk -v start=5 -v end=8 '{
    mid = substr($0, start, end-start+1)
    print substr($0, 1, start-1) gensub(/./, "N", "g", mid) substr($0, end+1)
}' file

Or with perl

perl -spe 'substr($_, $start-1, $end-$start+1) =~ s/./N/g' -- -start=5 -end=8 file

With both solutions, we pass the start and end values to the program with command line options. This makes it easy to alter the values from within a shell script. If you need to make the replacement character N dynamic as well, it should be pretty obvious how.

Solution 2

If you have GNU awk (gawk) you could set FIELDWIDTHS to split the line into fields based on character positions. This is particularly convenient for your case in gawk version >= 4.2, which supports a "wildcard" trailing fieldwidth. You can then replace characters in the second field using gsub:

echo ABCDABCDABCD | ./gawk -v i=5 -v n=4 '
  BEGIN {FIELDWIDTHS = sprintf("%d %d *", i-1, n)} 
  {gsub(/./,"N",$2)} 1
' OFS=""
ABCDNNNNABCD

In older versions of gawk, you can simulate the * by choosing a suitably large maximum size for the trailing field:

echo ABCDABCDABCD | gawk -v i=5 -v n=4 '
  BEGIN {FIELDWIDTHS = sprintf("%d %d 65536", i-1, n)} 
  {gsub(/./,"N",$2)} 1
' OFS=""
ABCDNNNNABCD

See

Processing Fixed-Width Data

Capturing Optional Trailing Data

Solution 3

Using sed

To replace characters 5 through 8 with N:

$ sed -E 's/(.{4}).{4}/\1NNNN/' test
ABCDNNNNABCD

How it works:

(.{4}) captures the first four characters in group 1.
.{4} matches the next four characters.
\1NNNN replaces the above with group 1 and four N.

Using GNU awk

$ gawk -F "" '{for (i=5; i<=8; i++) $i="N"} 1' OFS="" test
ABCDNNNNABCD

How it works:

-F "" tells awk to treat each character as a separate field.
for (i=5; i<=8; i++) $i="N" loops over each character from 5 through 8 and changes it to N.
1 tells awk to print the line.

10,253

Paolo Lorenzini

Applying Data Science to genetics of human populations.

Updated on September 18, 2022

Comments

Paolo Lorenzini over 1 year
I have a file which contains a very long string of characters and I would like to replace a substring of it with Ns. Example:

test
```
ABCDABCDABCD
```
I would like to replace a substring of it with all letters N with awk command and sed, all the characters from index 5 to 8, so the total length of letter N is 4.

Output
```
ABCDNNNNABCD
```
I tried something like this:
```
awk '{ v=substr($0,5,4); sed -i "s/$v/N/g";print substr($0,1,4)""v""substr($0,9,12)}' test
```
however, this command seems to give this output:
```
ABCDABCDABC
```
And no substitution was made

I would like to have in the code the number of the index from where to start the substitution, (here, for example, is 5) and the length number of the substitution ( here 4), so I can just modify these numbers in case I want to start in another position and for a different length of substitutions because in reality, I have a string with thousands of letter and I want to replace hundreds of characters so substitution of pattern does not work in my case
- Angel Todorov almost 5 years
  
  Awk is not like shell: you can't just put a sed call in there.