How do I Copy and Paste lines between a start and end keyword?

9,502

Solution 1

To copy all lines between %packages and %end from file1 into file2:

awk '$1=="%end" {f=0;next} f{print;next} $1=="%packages" {f=1}' file1 >>file2

This solution is designed to remove the lines %packages and %end. (If you want those lines to be transferred as well, there is an even simpler solution below.)

Since awk implicitly loops over all lines in a file, the above commands are applied to each line in file1. It uses a flag, called f, to determine if we are within the packages section of file1. Every line within the packages section is printed to stdout which is redirected to file2.

Let us consider the awk commands, one by one:

  • $1=="%end" {f=0;next}

    This command checks to see if the line begins with %end. If it does, the flag f is set to zero and we skip to the next line.

  • f{print;next}

    This command checks to see if the flag f is nonzero. If it is nonzero, then the line is printed and we skip to the next line.

  • $1=="%packages" {f=1}

    This command checks to see if the line starts with %packages. If so, it sets the flag f to one so that lines after this will be printed.

Including the marker lines:

The above excludes the marker lines %packages and %end. If you want those included, then use:

awk '/^%packages/,/^%end/ {print}' file1 >>file2

Solution 2

In addition to awk, another solution to consider is sed:

sed -n '/%packages/,/%end/ w file2' file1

breakdown by order of appearance:
sed itself, obviously, followed by an opening '. This tells sed everything from this point until closing ' is an argument/command to sed itself. Everything after that is input (or output if using redirection >file)

-n Suppress printing. Without it the entire content of file1 will be printed, with matched text printed twice

/pattern1/,/pattern2/ Defines the limits of the range to match

w file Write to file. Must be last argument, followed by file name (or /path/to/file if not in current dir)

Finally, after closing single ' we have the input file.

Two final notes:

1.Some prefer to use redirection for the input file, so the final command looks like:

sed -n '/%packages/,/%end/ w file2' <file1

The advantage is more clarity - it's obvious where you get your input from. Likewise, instead of using w file you can redirect the output >file :

sed -n '/%packages/,/%end/ p' <file1 >file2

In that case we add p to print the match (override -n for selection)

However, sed can operate on multiple input files:

sed -n '/%packages/,/%end/ w file-final' file1 file2 file3

and using redirection tends to blind users to this feature.

2.The above matches including the start and end lines, as sed operates at the line level, not the word level. One solution might be to simply pipe to more sed:

sed -n '/%packages/,/%end/ w file2' file1 | sed -e '1d' -e '$d'

which introduces the following new features:
-e allows to run several commands on the same input
1 shows that line numbering matches work
d is for deletion of matching pattern - line number 1 in first command
$ is the end of the input stream. Since sed operates at the line, and not word, level, the entire line at the end gets deleted

But, we can actually do this in a single sed invocation, using curly braces for grouping (here in a script for clarity):

#!/bin/bash
sed -n '
  /%packages/,/%end/ {
    /%packages/n
    /%end/ !{
      w file2
    }
  }
' file1

The only thing new here (aside from the grouping) is using ! to negate the match.
/pattern/n Suppress printing line with pattern (same as -n at beginning). /pattern/ ! Selects everything NOT matched by pattern (reverse matching). The reason, BTW, is quite simple. if we did another /%end/n to suppress the %end pattern, we would also suppress it from limiting our range, and the file would be printed to the end.

Solution 3

Most easy to understand:

grep -A 1000 '%packages' xx | grep -B 1000 '%end'

First part searches for %packages and prints 1000 lines (including the matched line) A fter it.

Second part after the pipe: Searches for %end and prints all 1000 lines (including the matched line) B efore.

If the file has more than 1000 lines, change the 1000 to a greater number.

And if you want to only match lines, that have nothing but the search pattern, include the beginning and end regex, i. e.

grep -A 1000 '^%packages$' xx | grep -B 1000 '^%end$'

If you don’t want the matched lines included, add another pipe:

grep -A 1000 '^%packages$' xx | grep -B 1000 '^%end$' | grep -v -e '^%packages$' -e '^%end$'

where -e can be used to specify multiple search patterns and -v is used to invert the sense of matching, to select non-matching lines.

Share:
9,502

Related videos on Youtube

Centimane
Author by

Centimane

Updated on September 18, 2022

Comments

  • Centimane
    Centimane over 1 year

    I have two text files, and I want to copy a bunch of lines from one to the other. File one has a list of packages, and I want to copy it to list two. This list of packages isn't at the start of file one, but have a tag %packages at the start of the list, and a tag %end at the end. I'm wondering how I can copy all lines between %packages and %end from file 1 into file 2?

    • smw
      smw almost 10 years
      Do you want to copy the lines to a particular location within an existing file 2, or just create a new file 2 consisting of the lines from %packages to %end from file 1? Do you want to include the %packages and %end markers in the copied text, or exclude them?
  • erik
    erik almost 10 years
    @Dani_l: Did you try it? Here it works as expected. Try it: Create file for i in 1 2 3 %packages 4 5 6 7 8 %end 9 10; do echo $i; done > file and use my command. You will get only the lines 4, 5, 6, 7, 8 which are between the tags.
  • Dani_l
    Dani_l almost 10 years
    I misunderstood the meaning of ONLY in your sentence. Your regex is correct, the wording could be clearer - I thought you meant to print only lines with the pattern, nothing else.
  • erik
    erik almost 10 years
    Can you tell me how to reword it? I am not English native speaker.