How to get the part of a file after the first line that matches a regular expression
Solution 1
The following will print the line matching TERMINATE
till the end of the file:
sed -n -e '/TERMINATE/,$p'
Explained: -n
disables default behavior of sed
of printing each line after executing its script on it, -e
indicated a script to sed
, /TERMINATE/,$
is an address (line) range selection meaning the first line matching the TERMINATE
regular expression (like grep) to the end of the file ($
), and p
is the print command which prints the current line.
This will print from the line that follows the line matching TERMINATE
till the end of the file:
(from AFTER the matching line to EOF, NOT including the matching line)
sed -e '1,/TERMINATE/d'
Explained: 1,/TERMINATE/
is an address (line) range selection meaning the first line for the input to the 1st line matching the TERMINATE
regular expression, and d
is the delete command which delete the current line and skip to the next line. As sed
default behavior is to print the lines, it will print the lines after TERMINATE
to the end of input.
If you want the lines before TERMINATE
:
sed -e '/TERMINATE/,$d'
And if you want both lines before and after TERMINATE
in two different files in a single pass:
sed -e '1,/TERMINATE/w before
/TERMINATE/,$w after' file
The before and after files will contain the line with terminate, so to process each you need to use:
head -n -1 before
tail -n +2 after
IF you do not want to hard code the filenames in the sed script, you can:
before=before.txt
after=after.txt
sed -e "1,/TERMINATE/w $before
/TERMINATE/,\$w $after" file
But then you have to escape the $
meaning the last line so the shell will not try to expand the $w
variable (note that we now use double quotes around the script instead of single quotes).
I forgot to tell that the new line is important after the filenames in the script so that sed knows that the filenames end.
How would you replace the hardcoded TERMINATE
by a variable?
You would make a variable for the matching text and then do it the same way as the previous example:
matchtext=TERMINATE
before=before.txt
after=after.txt
sed -e "1,/$matchtext/w $before
/$matchtext/,\$w $after" file
to use a variable for the matching text with the previous examples:
## Print the line containing the matching text, till the end of the file:
## (from the matching line to EOF, including the matching line)
matchtext=TERMINATE
sed -n -e "/$matchtext/,\$p"
## Print from the line that follows the line containing the
## matching text, till the end of the file:
## (from AFTER the matching line to EOF, NOT including the matching line)
matchtext=TERMINATE
sed -e "1,/$matchtext/d"
## Print all the lines before the line containing the matching text:
## (from line-1 to BEFORE the matching line, NOT including the matching line)
matchtext=TERMINATE
sed -e "/$matchtext/,\$d"
The important points about replacing text with variables in these cases are:
- Variables (
$variablename
) enclosed insingle quotes
['
] won't "expand" but variables insidedouble quotes
["
] will. So, you have to change all thesingle quotes
todouble quotes
if they contain text you want to replace with a variable. - The
sed
ranges also contain a$
and are immediately followed by a letter like:$p
,$d
,$w
. They will also look like variables to be expanded, so you have to escape those$
characters with a backslash [\
] like:\$p
,\$d
,\$w
.
Solution 2
As a simple approximation you could use
grep -A100000 TERMINATE file
which greps for TERMINATE
and outputs up to 100,000 lines following that line.
From the man page:
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.
Solution 3
A tool to use here is AWK:
cat file | awk 'BEGIN{ found=0} /TERMINATE/{found=1} {if (found) print }'
How does this work:
- We set the variable 'found' to zero, evaluating false
- if a match for 'TERMINATE' is found with the regular expression, we set it to one.
- If our 'found' variable evaluates to True, print :)
The other solutions might consume a lot of memory if you use them on very large files.
Solution 4
If I understand your question correctly you do want the lines after TERMINATE
, not including the TERMINATE
-line. AWK can do this in a simple way:
awk '{if(found) print} /TERMINATE/{found=1}' your_file
Explanation:
- Although not best practice, you could rely on the fact that all variables defaults to 0 or the empty string if not defined. So the first expression (
if(found) print
) will not print anything to start off with. - After the printing is done, we check if this is the starter-line (that should not be included).
This will print all lines after the TERMINATE
-line.
Generalization:
- You have a file with start- and end-lines and you want the lines between those lines excluding the start- and end-lines.
- start- and end-lines could be defined by a regular expression matching the line.
Example:
$ cat ex_file.txt
not this line
second line
START
A good line to include
And this line
Yep
END
Nope more
...
never ever
$ awk '/END/{found=0} {if(found) print} /START/{found=1}' ex_file.txt
A good line to include
And this line
Yep
$
Explanation:
- If the end-line is found no printing should be done. Note that this check is done before the actual printing to exclude the end-line from the result.
- Print the current line if
found
is set. - If the start-line is found then set
found=1
so that the following lines are printed. Note that this check is done after the actual printing to exclude the start-line from the result.
Notes:
- The code rely on the fact that all AWK variables defaults to 0 or the empty string if not defined. This is valid, but it may not be best practice so you could add a
BEGIN{found=0}
to the start of the AWK expression. - If multiple start-end-blocks are found, they are all printed.
Solution 5
grep -A 10000000 'TERMINATE' file
is much, much faster than sed, especially working on really a big file. It works up to 10M lines (or whatever you put in), so there isn't any harm in making this big enough to handle about anything you hit.
Related videos on Youtube
Yugal Jindle
Everybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid. -- Anonymous Github : YugalJindle Twitter : @YugalJindle Google+ : +YugalJindle LinkedIn : http://www.linkedin.com/in/YugalJindle
Updated on July 25, 2022Comments
-
Yugal Jindle almost 2 years
I have a file with about 1000 lines. I want the part of my file after the line which matches my grep statement.
That is:
cat file | grep 'TERMINATE' # It is found on line 534
So, I want the file from line 535 to line 1000 for further processing.
How can I do that?
-
Jacob almost 13 yearsUUOC (Useless Use of cat):
grep 'TERMINATE' file
-
Yugal Jindle almost 13 yearsI know that, its like I use it that way. Lets come back to the question.
-
aioobe almost 13 yearsThis is a perfectly fine programming question, and well suited for stackoverflow.
-
runeks almost 8 years@Jacob It's not useless use of cat at all. Its use is to print a file to standard output, which means we can use
grep
s standard input interface to read data in, rather than having to learn what switch to apply togrep
, andsed
, andawk
, andpandoc
, andffmpeg
etc. when we want to read from a file. It saves time because we don't have to learn a new switch every time we want to do the same thing: read from a file. -
LOAS almost 7 years@runeks I agree with your sentiment - but you can achieve that without cat:
grep 'TERMINATE' < file
. Maybe it does make the reading a bit harder - but this is shell scripting, so that's always going to be a problem :) -
kvantour almost 5 years
-
-
Yugal Jindle almost 13 yearsCan you explain what are you doing ?
-
Yugal Jindle almost 13 yearsThat might work for this, but I need to code it into my script to process many files. So, show some generic solution.
-
Yugal Jindle almost 13 years--after-context is fine but not in all cases.
-
Yugal Jindle almost 13 yearsCan you suggest something else.. ??
-
Mu Qiao almost 13 yearsI copied the content of "file" into the $content variable. Then I removed all the characters until "TERMINATE" was seen. It didn't use greedy matching, but you can use greedy matching by ${content##*TERMINATE}.
-
Mu Qiao almost 13 yearshere is the link of the bash manual: gnu.org/software/bash/manual/…
-
Yugal Jindle almost 13 yearsHow can we get the lines before TERMINATE and delete all that follows ?
-
michelgotta about 11 yearsI think this is one practical solution!
-
PiyusG about 10 yearssimilarly -B NUM, --before-context=NUM Print NUM lines of leading context before matching lines. Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.
-
Znik over 9 yearswhat will happen if file is 100GB size ?
-
Znik over 9 yearsfile is scanned twice. what if it is 100GB size?
-
123 about 9 yearsFor the number your can also use
more +7 file
-
Sébastien Clément over 8 yearsHow would your replace the hardcoded TERMINAL by a variable?
-
tripleee over 8 yearsExtracting a line number with
grep
so you can feed it totail
is a wasteful antipattern. Finding the match and printing up through the end of the file (or, conversely, printing and stopping at the first match) is eminently done with the normal, essential regex tools themselves. The massivegrep | tail | sed | awk
is also in and of itself a massive useless use ofgrep
and friends. -
Jose Martinez over 8 yearsthis solution worked for me because i can easily use variables as my string to check for.
-
fbicknel almost 8 yearsI think s*he was trying to give us something that would find the /last instance/ of 'TERMINATE' and give the lines from that instance on. Other implementations give you the first instance onward. The LINE_NUMBER should probably look like this, instead: LINE_NUMBER=$(grep -o -n 'TERMINATE' $OSCAM_LOG | tail -n 1| awk -F: '{print $1}') Maybe not the most elegant way, but it seems to get the job done. ^.^
-
fbicknel almost 8 years... or all in one line, but ugly: tail -n +$(grep -o -n 'TERMINATE' $YOUR_FILE_NAME | tail -n 1| awk -F: '{print $1}') $YOUR_FILE_NAME
-
fbicknel almost 8 years.... and I was going to go back and edit out $OSCAM_LOG in lieu of $YOUR_FILE_NAME... but can't for some reason. No idea where $OSCAM_LOG came from; I just mindlessly parroted it. o.O
-
mivk almost 8 yearsThis includes the matching line, which is not what is wanted in this question.
-
fedorqui almost 8 years@mivk well, this is also the case of the accepted answer and the 2nd most upvoted, so the problem may be with a misleading title.
-
tripleee almost 8 yearsDoing this in Awk alone is a common task in Awk 101. If you are already using a more capable tool just to get the line number, let go of
tail
and do the task in the more capable tool altogether. Anyway, the title clearly says "first match". -
tripleee almost 8 yearsDownvote: This is horrible (reading the file into a variable) and wrong (using the variable without quoting it; and you should properly use
printf
or make sure you know exactly what you are passing toecho
.). -
Mad Physicist almost 8 yearsDownvoted because this is a crappy solution, but then upvoted because 90% of the answer is caveats.
-
mato over 7 yearsOne use case that's missing here is how to print lines after the last marker (if there can be multiple of them in the file .. think log files etc).
-
Karalga over 7 yearsThe example
sed -e "1,/$matchtext/d"
does not work when$matchtext
occurs in the first line. I had to change it tosed -e "0,/$matchtext/d"
. -
Samveen about 7 yearsIf the line number is known, then
grep
isn't even required; you can just usetail -n $NUM
, so this isn't really an answer. -
Lemming almost 7 yearsNice idea! If you are uncertain about the size of the context you may count the lines of
file
instead:grep -A$(cat file | wc -l) TERMINATE file
-
Aleksander Stelmaczonek almost 7 yearsSimple, elegant and very generic. In my case it was printing everything until second occurrence of '###':
cat file | awk 'BEGIN{ found=0} /###/{found=found+1} {if (found<2) print }'
-
Timothy Swan over 6 yearsI need something that limits characters, not lines.
-
Ahmed almost 6 yearsIf you want the exact rest line in your file after the pattern TERMINATE, you can une this :
grep -A$(($(cat file | wc -l)-$(grep -n TERMINATE file | awk -F":" '{print $1}'))) TERMINATE file
-
aioobe almost 6 years@Ahmed, how is that better than
grep -A$(wc -l < file) TERMINATE file
? -
Ahmed almost 6 years@aioobe because it returns only the lines that remain for the end of file $(($(cat file | wc -l)-$(grep -n TERMINATE file | awk -F":" '{print $1}')))
-
aioobe almost 6 years@Ahmed, but so does
grep -A$(wc -l < file) TERMINATE file
, right? -
tripleee almost 6 yearsA tool not to use here is
cat
.awk
is perfectly capable of taking one or more filenames as arguments. See also stackoverflow.com/questions/11710552/useless-use-of-cat -
user1169420 over 5 yearsAwesome Awesome example. Just spent 2 hours looking at csplit, sed, and all manner of over complicated awk commands. Not only did this do what I wanted but shown simple enough to infer how to modify it to do a few other related things I needed. Makes me remember awk is great and not just in indecipherable mess of crap. Thanks.
-
szmoore over 5 yearsUsing
wc -l
to make sure you don't accidentally truncate lines is nice, but you just needNUM > lines remaining
notNUM == lines remaining
. The calculation of the "exact" number of lines remaining is going to read the file many more times than is necessary and is more complicated than thesed
orawk
solutions (the main advantage ofgrep
is it's the easiest to remember). -
user000001 about 5 years
{if(found) print}
is a bit of an anti-pattern in awk, it's more idiomatic to replace the block with justfound
orfound;
if you need another filter afterwards. -
John_Smith about 5 years@user000001 please explain. I do not understand what to replace and how. Anyway I think the way its written makes it very clear what is going on.
-
user000001 about 5 yearsYou would replace
awk '{if(found) print} /TERMINATE/{found=1}' your_file
withawk 'found; /TERMINATE/{found=1}' your_file
, they should both do the same thing. -
Znik over 3 yearsunfortunately grep doesn't support INFINITE as NUM for -A and -B option :( then we must add very big numbers, but we don't know what is maximum int for them.
-
Pavan Kumar almost 3 yearsOne stop shop for my problem. Prefer to double upvote this answer, but I can't.
-
Peter Mortensen over 2 yearsWhat do you mean by "handle about anything you hit" (seems incomprehensible)? Please respond by editing (changing) your answer, not here in comments (without "Edit:", "Update:", or similar - the answer should appear as if it was written today).
-
Mxt about 2 years@Karalga had the same issue, except
sed -e "0,/$matchtext/d"
still displays$matchtext
for me, so I did this:sed -e "0,/$matchtext/d" | tail -n +2
. Butsed -e '1i\\n' | sed -e "1,/$matchtext/d"
should work universally.