Replace string with sequential index

shell text-processing sed awk perl

9,980

Solution 1

perl -pe 's/instant/$& . ++$n/ge'

or with GNU awk:

awk -vRS=instant '{$0=n$0;ORS=RT}++n'

To edit the files in-place, add the -i option to perl:

perl -pi -e 's/instant/$& . ++$n{$ARGV}/ge' ./*.vs

Or recursively:

find . -name '*.vs' -type f -exec perl -pi -e '
  s/instant/$& . ++$n{$ARGV}/ge' {} +

Explanations

perl -pe 's/instant/$& . ++$n/ge'

-p is to process the input line by line, evaluate the expression passed to -e for each line and print it. For each line, we substitute (using the s/re/repl/flags operator) instant for itself ($&) and the incremented value of a variable ++$n. The g flag is to make the substitution globally (not just once), and e so that the replacement is interpreted as perl code to e̲valuate (not a fixed string).

For in-place editing where one perl invocation processes more than one file, we want $n to reset at each file. Instead, we use $n{$ARGV} (where $ARGV is the currently processed file).

The awk one deserves a bit of explanation.

awk -vRS=instant '{$0=n$0;ORS=RT}++n'

We're using the ability of GNU awk to separate records on arbitrary strings (even regexps). With -vRS=instant, we set the r̲ecord s̲eparator to instant. RT is the variable that holds what was matched by RS, so typically, instant except for the last record where it will be the empty string. In the input above the records ($0) and record terminators (RT) are ([$0|RT]):

[test  |instant][  ()
test  |instant][  ()
...
test  |instant][  ()    //total 1000 lines|]

So all we need to do is insert an incrementing number at the start of every record except the first one.

Which is what we do above. For the first record, n will be empty. We set ORS (the o̲utput r̲ecord s̲eparator) to RT, so that awk prints n $0 RT. It does it upon the second expression (++n) which is a condition that always evaluates to true (a non-zero number), and therefore the default action (of printing $0 ORS) is performed for every record.

Solution 2

sed is really not the best tool for the job, you want something with better scripting capabilities. Here are some choices:

perl
```
perl -00pe 's/instant/$& . $./e' file 
```
The -p means "print every line" after applying whatever script is given with -e. The -00 turns on "paragraph mode" so records (lines) are defined by consecutive newline (\n) characters, this lets it deal with double spaced lines correctly. $& is the last pattern matched and $. is the current line number of the input file. The e in s///e allows me to evaluate expressions in the substitution operator.
awk (this assumes your data are exactly as shown, with three space separated fields)
```
awk '{if(/./) print $1,$2 ++k,$3; else print}' file 
```
Here, we increment the k variable k only if the current line is not empty /./ in which case we also print the necessary info. Empty lines are printed as is.
various shells
```
 n=0; while read -r a b c; do 
   if [ "$a" ] ; then 
      (( n++ ))
      printf "%s %s%s %s\n" "$a" "$b" "$n" "$c"
   else
      printf "%s %s %s\n" "$a" "$b" "$c"
   fi
 done < file 
```
Here, each input line is automatically split on whitespace and the fields are saved as $a, $b and $c. Then, within the loop, $c is augmented by one for each line for which $a is not empty and it's current value is printed next to the second field, $b.

NOTE: all the above solutions assume that all lines in the file are of the same format. If not, @Stephane's answer is the way to go.

For dealing with many files, and assuming that you want to do this to all files in the current directory, you can use this:

for file in ./*; do perl -i -00pe 's/instant/$& . $./e' "$file"; done

CAREFUL: That assumes simple file names with no spaces, if need to deal with something more complex, go for (assuming ksh93, zsh or bash):

find . -type f -print0 | while IFS= read -r -d ''; do
    perl -i -00pe 's/instant/$& . $./e' "$file"
done

9,980

user3342338

Updated on September 18, 2022

Comments

user3342338 over 1 year
Can someone suggest an elegant way to accomplish this?

Input:
```
test  instant  ()

test  instant  ()

...
test  instant  ()    //total 1000 lines
```
output should be:
```
test      instant1  ()

test      instant2  ()

test      instant1000()
```
The empty lines are in my input files and there are many files under the same directory that I need to process at once.

I tried this to replace many files in the same dir and didn't work.
```
for file in ./*; do perl -i -000pe 's/instance$& . ++$n/ge' "$file"; done
```
errors:
```
Substitution replacement not terminated at -e line 1.
Substitution replacement not terminated at -e line 1.
```
and I also tried this:
```
perl -i -pe 's/instant/$& . ++$n/ge' *.vs
```
It worked but the index just kept incrementing from one to another file. I'd like to reset that to 1 upon change to a new file. Any good suggestions?
```
find . -type f -exec perl -pi -e 's/instant/$& . ++$n{$ARGV}/ge' {} +
```
works but it replaced all other files shouldn't be replaced. I prefer to just replace the files with *.txt only.
- terdon about 10 years
  
  And do they all consist exclusively of either blank lines or test instant ()?
- Timo about 10 years
  
  I put the double spaced lines back in, they are often a sign of new users not knowing how to use this site's markup, that is why terdon removed them while properly indenting your file content block so it shows as file content. Hope it is ok now.
user3342338 about 10 years

the perl script works. however there is one small issue if the lines are double space.
terdon about 10 years

@user3342338 yes, that will increment the counter since I am using the current line number. This is a very naive approach, as I said Stephane's is more robust. None of these work if you have blank lines or if any of your lines deviate from what you show.
terdon about 10 years

@user3342338 see updated answer. They should all now work for double spaced files.
Gilles 'SO- stop being evil' about 10 years

This could use a bit of explanation.
Madivad over 8 years

Great answer and the option of alternative methods!! Thanks