Grab nth occurrence in between two patterns using awk or sed
Solution 1
This might work for you (GNU sed):
'sed -n '/category/{:a;N;/done/!ba;x;s/^/x/;/^x\{3\}$/{x;p;q};x}' file
Turn off automatic printing by using the -n
option. Gather up lines between category
and done
. Store a counter in the hold space and when it reaches 3 print the collection in the pattern space and quit.
Or if you prefer awk:
awk '/^category/,/^done/{if(++m==1)n++;if(n==3)print;if(/^done/)m=0}' file
Solution 2
Try doing this :
awk -v n=3 '/^category/{l++} (l==n){print}' file.txt
Or more cryptic :
awk -v n=3 '/^category/{l++} l==n' file.txt
If your file is big :
awk -v n=3 '/^category/{l++} l>n{exit} l==n' file.txt
Solution 3
If your file doesn't contain any null characters, here's on way using GNU sed
. This will find the third occurrence of a pattern range. However, you can easily modify this to get any occurrence you'd like.
sed -n '/^category/ { x; s/^/\x0/; /^\x0\{3\}$/ { x; :a; p; /done/q; n; ba }; x }' file.txt
Results:
category
3
r
d
done
Explanation:
Turn off default printing with the -n
switch. Match the word 'category' at the start of a line. Swap the pattern space with the hold space and append a null character to the start of the pattern. In the example, if the pattern then contains two leading null characters, pull the pattern out of holdspace. Now create a loop and print the contents of the pattern space until the last pattern is matched. When this last pattern is found, sed
will quit. If it's not found sed
will continue to read the next line of input in and continue in its loop.
Solution 4
awk -v tgt=3 '
/^category$/ { fnd=1; rec="" }
fnd {
rec = rec $0 ORS
if (/^done$/) {
if (++cnt == tgt) {
printf "%s",rec
exit
}
fnd = 0
}
}
' file
Dan Lawless
Updated on June 20, 2022Comments
-
Dan Lawless almost 2 years
I have an issue where I want to parse through the output from a file and I want to grab the nth occurrence of text in between two patterns preferably using awk or sed
category 1 s t done category 2 n d done category 3 r d done category 4 t h done
Let's just say for this example I want to grab the third occurrence of text in between category and done, essentially the output would be
category 3 r d done
-
Dan Lawless over 11 yearsSorry lets say that the beginning and end are not the same word, I want the third occurrence of what comes in between category and done.
-
Gilles Quenot over 11 years
/^category/
mean a string beginning with "category", it's really different than a line containing category. So no need any modification, the script still works AS IS. -
Ed Morton over 11 yearsthat will print the text between occurrences of the word category, not between category and done. In the posted input it doesn't matter but in general it could, e.g. if f there can be other text between done and category or occurrences of category without an associated done.
-
Ed Morton over 11 yearssed is an excellent tool for simple substitutions on a single line. For anything else just use awk or you'll find the tiniest requirements change (e.g. print the line numbers too) requires a total re-write of your script, possibly in a different language. Doing anything in sed that requires more than "s" and "g" commands is a waste of time.
-
Ed Morton over 11 yearsI'd like it to print the 3rd occurrence but only if the 2nd occurrence contained the word "awk". How would I modify that sed command to do that? In awk I'd simply create a "prevRec" variable to store the previous record and add an
if (prevRec ~ /awk/)
before the print. -
Ed Morton over 11 yearsThat will work with the posted sample input but won't work if there cane be occurrences of category without done or text between done and category.
-
Ed Morton over 11 yearsthe awk script will keep printing after the 'done" if there's tes=xt between done and the next category. It would also print the wrong block if category can exist without a done. don't know what the sed scripts would do.
-
potong over 11 years@EdMorton I believe printing is narrowed to between
category
anddone
, If there is nodone
this may be what the user requires. -
Ed Morton over 11 yearsTry it with a file with 2 "category" lines before the first "done". It'll print the 2nd category->done block instead of the 3rd.
-
Dan Lawless over 11 yearsI used this awk method awk '/^category/,/^done/{if(/^category/)n++;if(n==3)print}' file Thanks for the response
-
Ed Morton over 11 yearsjust curious: why? it's testing the same condition multiple times and won't work if your input file changes slightly. If you're happy with a solution that only works with exactly the posted input format, @sputnik's solution is much more concise.
-
Thor over 11 years@EdMorton: True. One possible fix is to clean up the input first, see edit.
-
Ed Morton over 11 yearsIt would still fail and print the 2nd record instead of the 3rd one if you added a "category" line before the first "done" in your sample input, e.g. between the "s" and "t" lines.
-
Thor over 11 years@EdMorton: Right, I see your point, the ending pattern is not searched for. I've added a
getline
alternative that does search fordone
. -
Ed Morton over 11 yearsIMHO the non-getline version I posted is simpler and it doesn't have all the getline caveats (see awk.info/?tip/getline). I expect you posted that just as a contrast to the other solutions but for the OPs benefit I think it's worth explicitly mentioning that it comes with some baggage.
-
Thor over 11 years@EdMorton: Indeed, I meant it as a contrast. I hadn't realized getline had this many potential issues, depending on what the OP is doing this may or may not be a problem. I'll put a warning and reference on the answer. Nice post by the way.
-
Ed Morton over 11 years1) Thanks. 2) Yeah, getline is a can of worms. It's very useful when used appropriately, though, much like a hand grenade.
-
ArigatoManga over 5 yearscan you please explain both the sed commands..that will be really informative & helpful !