How to get the part of a file after the first line that matches a regular expression

bash shell scripting grep

217,399

Solution 1

The following will print the line matching TERMINATE till the end of the file:

sed -n -e '/TERMINATE/,$p'

Explained: -n disables default behavior of sed of printing each line after executing its script on it, -e indicated a script to sed, /TERMINATE/,$ is an address (line) range selection meaning the first line matching the TERMINATE regular expression (like grep) to the end of the file ($), and p is the print command which prints the current line.

This will print from the line that follows the line matching TERMINATE till the end of the file: (from AFTER the matching line to EOF, NOT including the matching line)

sed -e '1,/TERMINATE/d'

Explained: 1,/TERMINATE/ is an address (line) range selection meaning the first line for the input to the 1st line matching the TERMINATE regular expression, and d is the delete command which delete the current line and skip to the next line. As sed default behavior is to print the lines, it will print the lines after TERMINATE to the end of input.

If you want the lines before TERMINATE:

sed -e '/TERMINATE/,$d'

And if you want both lines before and after TERMINATE in two different files in a single pass:

sed -e '1,/TERMINATE/w before
/TERMINATE/,$w after' file

The before and after files will contain the line with terminate, so to process each you need to use:

head -n -1 before
tail -n +2 after

IF you do not want to hard code the filenames in the sed script, you can:

before=before.txt
after=after.txt
sed -e "1,/TERMINATE/w $before
/TERMINATE/,\$w $after" file

But then you have to escape the $ meaning the last line so the shell will not try to expand the $w variable (note that we now use double quotes around the script instead of single quotes).

I forgot to tell that the new line is important after the filenames in the script so that sed knows that the filenames end.

How would you replace the hardcoded TERMINATE by a variable?

You would make a variable for the matching text and then do it the same way as the previous example:

matchtext=TERMINATE
before=before.txt
after=after.txt
sed -e "1,/$matchtext/w $before
/$matchtext/,\$w $after" file

to use a variable for the matching text with the previous examples:

## Print the line containing the matching text, till the end of the file:
## (from the matching line to EOF, including the matching line)
matchtext=TERMINATE
sed -n -e "/$matchtext/,\$p"

## Print from the line that follows the line containing the
## matching text, till the end of the file:
## (from AFTER the matching line to EOF, NOT including the matching line)
matchtext=TERMINATE
sed -e "1,/$matchtext/d"

## Print all the lines before the line containing the matching text:
## (from line-1 to BEFORE the matching line, NOT including the matching line)
matchtext=TERMINATE
sed -e "/$matchtext/,\$d"

The important points about replacing text with variables in these cases are:

Variables ($variablename) enclosed in single quotes ['] won't "expand" but variables inside double quotes ["] will. So, you have to change all the single quotes to double quotes if they contain text you want to replace with a variable.
The sed ranges also contain a $ and are immediately followed by a letter like: $p, $d, $w. They will also look like variables to be expanded, so you have to escape those $ characters with a backslash [\] like: \$p, \$d, \$w.

Solution 2

As a simple approximation you could use

grep -A100000 TERMINATE file

which greps for TERMINATE and outputs up to 100,000 lines following that line.

From the man page:

-A NUM, --after-context=NUM

Print NUM lines of trailing context after matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.

Solution 3

A tool to use here is AWK:

cat file | awk 'BEGIN{ found=0} /TERMINATE/{found=1}  {if (found) print }'

How does this work:

We set the variable 'found' to zero, evaluating false
if a match for 'TERMINATE' is found with the regular expression, we set it to one.
If our 'found' variable evaluates to True, print :)

The other solutions might consume a lot of memory if you use them on very large files.

Solution 4

If I understand your question correctly you do want the lines after TERMINATE, not including the TERMINATE-line. AWK can do this in a simple way:

awk '{if(found) print} /TERMINATE/{found=1}' your_file

Explanation:

Although not best practice, you could rely on the fact that all variables defaults to 0 or the empty string if not defined. So the first expression (if(found) print) will not print anything to start off with.
After the printing is done, we check if this is the starter-line (that should not be included).

This will print all lines after the TERMINATE-line.

Generalization:

You have a file with start- and end-lines and you want the lines between those lines excluding the start- and end-lines.
start- and end-lines could be defined by a regular expression matching the line.

Example:

$ cat ex_file.txt
not this line
second line
START
A good line to include
And this line
Yep
END
Nope more
...
never ever
$ awk '/END/{found=0} {if(found) print} /START/{found=1}' ex_file.txt
A good line to include
And this line
Yep
$

Explanation:

If the end-line is found no printing should be done. Note that this check is done before the actual printing to exclude the end-line from the result.
Print the current line if found is set.
If the start-line is found then set found=1 so that the following lines are printed. Note that this check is done after the actual printing to exclude the start-line from the result.

Notes:

The code rely on the fact that all AWK variables defaults to 0 or the empty string if not defined. This is valid, but it may not be best practice so you could add a BEGIN{found=0} to the start of the AWK expression.
If multiple start-end-blocks are found, they are all printed.

Solution 5

grep -A 10000000 'TERMINATE' file

is much, much faster than sed, especially working on really a big file. It works up to 10M lines (or whatever you put in), so there isn't any harm in making this big enough to handle about anything you hit.

View more solutions

217,399

Yugal Jindle

Everybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid. -- Anonymous Github : YugalJindle Twitter : @YugalJindle Google+ : +YugalJindle LinkedIn : http://www.linkedin.com/in/YugalJindle

Updated on July 25, 2022

Comments

Yugal Jindle almost 2 years
I have a file with about 1000 lines. I want the part of my file after the line which matches my grep statement.

That is:
```
cat file | grep 'TERMINATE'     # It is found on line 534
```
So, I want the file from line 535 to line 1000 for further processing.

How can I do that?
- Jacob almost 13 years
  
  UUOC (Useless Use of cat): grep 'TERMINATE' file
- Yugal Jindle almost 13 years
  
  I know that, its like I use it that way. Lets come back to the question.
- aioobe almost 13 years
  
  This is a perfectly fine programming question, and well suited for stackoverflow.
- runeks almost 8 years
  
  @Jacob It's not useless use of cat at all. Its use is to print a file to standard output, which means we can use greps standard input interface to read data in, rather than having to learn what switch to apply to grep, and sed, and awk, and pandoc, and ffmpeg etc. when we want to read from a file. It saves time because we don't have to learn a new switch every time we want to do the same thing: read from a file.
- LOAS almost 7 years
  
  @runeks I agree with your sentiment - but you can achieve that without cat: grep 'TERMINATE' < file. Maybe it does make the reading a bit harder - but this is shell scripting, so that's always going to be a problem :)
- kvantour almost 5 years
  
  See this answer from Ed Morton
Yugal Jindle almost 13 years

Can you explain what are you doing ?
Yugal Jindle almost 13 years

That might work for this, but I need to code it into my script to process many files. So, show some generic solution.
Yugal Jindle almost 13 years

--after-context is fine but not in all cases.
Yugal Jindle almost 13 years

Can you suggest something else.. ??
Mu Qiao almost 13 years

I copied the content of "file" into the $content variable. Then I removed all the characters until "TERMINATE" was seen. It didn't use greedy matching, but you can use greedy matching by ${content##*TERMINATE}.
Mu Qiao almost 13 years

here is the link of the bash manual: gnu.org/software/bash/manual/…
Yugal Jindle almost 13 years

How can we get the lines before TERMINATE and delete all that follows ?
michelgotta about 11 years

I think this is one practical solution!
PiyusG about 10 years

similarly -B NUM, --before-context=NUM Print NUM lines of leading context before matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.
Znik over 9 years

what will happen if file is 100GB size ?
Znik over 9 years

file is scanned twice. what if it is 100GB size?
123 about 9 years

For the number your can also use more +7 file
Sébastien Clément over 8 years

How would your replace the hardcoded TERMINAL by a variable?
tripleee over 8 years

Extracting a line number with grep so you can feed it to tail is a wasteful antipattern. Finding the match and printing up through the end of the file (or, conversely, printing and stopping at the first match) is eminently done with the normal, essential regex tools themselves. The massive grep | tail | sed | awk is also in and of itself a massive useless use of grep and friends.
Jose Martinez over 8 years

this solution worked for me because i can easily use variables as my string to check for.
fbicknel almost 8 years

I think s*he was trying to give us something that would find the /last instance/ of 'TERMINATE' and give the lines from that instance on. Other implementations give you the first instance onward. The LINE_NUMBER should probably look like this, instead: LINE_NUMBER=$(grep -o -n 'TERMINATE' $OSCAM_LOG | tail -n 1| awk -F: '{print $1}') Maybe not the most elegant way, but it seems to get the job done. ^.^
fbicknel almost 8 years

... or all in one line, but ugly: tail -n +$(grep -o -n 'TERMINATE' $YOUR_FILE_NAME | tail -n 1| awk -F: '{print $1}') $YOUR_FILE_NAME
fbicknel almost 8 years

.... and I was going to go back and edit out $OSCAM_LOG in lieu of $YOUR_FILE_NAME... but can't for some reason. No idea where $OSCAM_LOG came from; I just mindlessly parroted it. o.O
mivk almost 8 years

This includes the matching line, which is not what is wanted in this question.
fedorqui almost 8 years

@mivk well, this is also the case of the accepted answer and the 2nd most upvoted, so the problem may be with a misleading title.
tripleee almost 8 years

Doing this in Awk alone is a common task in Awk 101. If you are already using a more capable tool just to get the line number, let go of tail and do the task in the more capable tool altogether. Anyway, the title clearly says "first match".
tripleee almost 8 years

Downvote: This is horrible (reading the file into a variable) and wrong (using the variable without quoting it; and you should properly use printf or make sure you know exactly what you are passing to echo.).
Mad Physicist almost 8 years

Downvoted because this is a crappy solution, but then upvoted because 90% of the answer is caveats.
mato over 7 years

One use case that's missing here is how to print lines after the last marker (if there can be multiple of them in the file .. think log files etc).
Karalga over 7 years

The example sed -e "1,/$matchtext/d" does not work when $matchtext occurs in the first line. I had to change it to sed -e "0,/$matchtext/d".
Samveen about 7 years

If the line number is known, then grep isn't even required; you can just use tail -n $NUM, so this isn't really an answer.
Lemming almost 7 years

Nice idea! If you are uncertain about the size of the context you may count the lines of file instead: grep -A$(cat file | wc -l) TERMINATE file
Aleksander Stelmaczonek almost 7 years

Simple, elegant and very generic. In my case it was printing everything until second occurrence of '###': cat file | awk 'BEGIN{ found=0} /###/{found=found+1} {if (found<2) print }'
Timothy Swan over 6 years

I need something that limits characters, not lines.
Ahmed almost 6 years

If you want the exact rest line in your file after the pattern TERMINATE, you can une this : grep -A$(($(cat file | wc -l)-$(grep -n TERMINATE file | awk -F":" '{print $1}'))) TERMINATE file
aioobe almost 6 years

@Ahmed, how is that better than grep -A$(wc -l < file) TERMINATE file?
Ahmed almost 6 years

@aioobe because it returns only the lines that remain for the end of file $(($(cat file | wc -l)-$(grep -n TERMINATE file | awk -F":" '{print $1}')))
aioobe almost 6 years

@Ahmed, but so does grep -A$(wc -l < file) TERMINATE file, right?
tripleee almost 6 years

A tool not to use here is cat. awk is perfectly capable of taking one or more filenames as arguments. See also stackoverflow.com/questions/11710552/useless-use-of-cat
user1169420 over 5 years

Awesome Awesome example. Just spent 2 hours looking at csplit, sed, and all manner of over complicated awk commands. Not only did this do what I wanted but shown simple enough to infer how to modify it to do a few other related things I needed. Makes me remember awk is great and not just in indecipherable mess of crap. Thanks.
szmoore over 5 years

Using wc -l to make sure you don't accidentally truncate lines is nice, but you just need NUM > lines remaining not NUM == lines remaining. The calculation of the "exact" number of lines remaining is going to read the file many more times than is necessary and is more complicated than the sed or awk solutions (the main advantage of grep is it's the easiest to remember).
user000001 about 5 years

{if(found) print} is a bit of an anti-pattern in awk, it's more idiomatic to replace the block with just found or found; if you need another filter afterwards.
John_Smith about 5 years

@user000001 please explain. I do not understand what to replace and how. Anyway I think the way its written makes it very clear what is going on.
user000001 about 5 years

You would replace awk '{if(found) print} /TERMINATE/{found=1}' your_file with awk 'found; /TERMINATE/{found=1}' your_file, they should both do the same thing.
Znik over 3 years

unfortunately grep doesn't support INFINITE as NUM for -A and -B option :( then we must add very big numbers, but we don't know what is maximum int for them.
Pavan Kumar almost 3 years

One stop shop for my problem. Prefer to double upvote this answer, but I can't.
Peter Mortensen over 2 years

What do you mean by "handle about anything you hit" (seems incomprehensible)? Please respond by editing (changing) your answer, not here in comments (without "Edit:", "Update:", or similar - the answer should appear as if it was written today).
Mxt about 2 years

@Karalga had the same issue, except sed -e "0,/$matchtext/d" still displays $matchtext for me, so I did this: sed -e "0,/$matchtext/d" | tail -n +2. But sed -e '1i\\n' | sed -e "1,/$matchtext/d" should work universally.