Grab nth occurrence in between two patterns using awk or sed

shell sed awk

13,894

Solution 1

This might work for you (GNU sed):

'sed -n '/category/{:a;N;/done/!ba;x;s/^/x/;/^x\{3\}$/{x;p;q};x}' file

Turn off automatic printing by using the -n option. Gather up lines between category and done. Store a counter in the hold space and when it reaches 3 print the collection in the pattern space and quit.

Or if you prefer awk:

awk  '/^category/,/^done/{if(++m==1)n++;if(n==3)print;if(/^done/)m=0}'  file

Solution 2

Try doing this :

 awk -v n=3 '/^category/{l++} (l==n){print}' file.txt

Or more cryptic :

awk -v n=3 '/^category/{l++} l==n' file.txt

If your file is big :

awk -v n=3 '/^category/{l++} l>n{exit} l==n' file.txt

Solution 3

If your file doesn't contain any null characters, here's on way using GNU sed. This will find the third occurrence of a pattern range. However, you can easily modify this to get any occurrence you'd like.

sed -n '/^category/ { x; s/^/\x0/; /^\x0\{3\}$/ { x; :a; p; /done/q; n; ba }; x }' file.txt

Results:

category
3
r
d
done

Explanation:

Turn off default printing with the -n switch. Match the word 'category' at the start of a line. Swap the pattern space with the hold space and append a null character to the start of the pattern. In the example, if the pattern then contains two leading null characters, pull the pattern out of holdspace. Now create a loop and print the contents of the pattern space until the last pattern is matched. When this last pattern is found, sed will quit. If it's not found sed will continue to read the next line of input in and continue in its loop.

Solution 4

awk -v tgt=3 '
/^category$/ { fnd=1; rec="" }

fnd {
   rec = rec $0 ORS
   if (/^done$/) {
      if (++cnt == tgt) {
         printf "%s",rec
         exit
      }
      fnd = 0
   }
}
' file

View more solutions

13,894

Author by

Dan Lawless

Updated on June 20, 2022

Comments

Dan Lawless almost 2 years
I have an issue where I want to parse through the output from a file and I want to grab the nth occurrence of text in between two patterns preferably using awk or sed
```
category
1
s
t
done
category
2
n
d
done
category
3
r
d
done
category
4
t
h
done
```
Let's just say for this example I want to grab the third occurrence of text in between category and done, essentially the output would be
```
category
3
r
d
done
```
Dan Lawless over 11 years

Sorry lets say that the beginning and end are not the same word, I want the third occurrence of what comes in between category and done.
Gilles Quenot over 11 years

/^category/ mean a string beginning with "category", it's really different than a line containing category. So no need any modification, the script still works AS IS.
Ed Morton over 11 years

that will print the text between occurrences of the word category, not between category and done. In the posted input it doesn't matter but in general it could, e.g. if f there can be other text between done and category or occurrences of category without an associated done.
Ed Morton over 11 years

sed is an excellent tool for simple substitutions on a single line. For anything else just use awk or you'll find the tiniest requirements change (e.g. print the line numbers too) requires a total re-write of your script, possibly in a different language. Doing anything in sed that requires more than "s" and "g" commands is a waste of time.
Ed Morton over 11 years

I'd like it to print the 3rd occurrence but only if the 2nd occurrence contained the word "awk". How would I modify that sed command to do that? In awk I'd simply create a "prevRec" variable to store the previous record and add an if (prevRec ~ /awk/) before the print.
Ed Morton over 11 years

That will work with the posted sample input but won't work if there cane be occurrences of category without done or text between done and category.
Ed Morton over 11 years

the awk script will keep printing after the 'done" if there's tes=xt between done and the next category. It would also print the wrong block if category can exist without a done. don't know what the sed scripts would do.
potong over 11 years

@EdMorton I believe printing is narrowed to between category and done, If there is no done this may be what the user requires.
Ed Morton over 11 years

Try it with a file with 2 "category" lines before the first "done". It'll print the 2nd category->done block instead of the 3rd.
Dan Lawless over 11 years

I used this awk method awk '/^category/,/^done/{if(/^category/)n++;if(n==3)print}' file Thanks for the response
Ed Morton over 11 years

just curious: why? it's testing the same condition multiple times and won't work if your input file changes slightly. If you're happy with a solution that only works with exactly the posted input format, @sputnik's solution is much more concise.
Thor over 11 years

@EdMorton: True. One possible fix is to clean up the input first, see edit.
Ed Morton over 11 years

It would still fail and print the 2nd record instead of the 3rd one if you added a "category" line before the first "done" in your sample input, e.g. between the "s" and "t" lines.
Thor over 11 years

@EdMorton: Right, I see your point, the ending pattern is not searched for. I've added a getline alternative that does search for done.
Ed Morton over 11 years

IMHO the non-getline version I posted is simpler and it doesn't have all the getline caveats (see awk.info/?tip/getline). I expect you posted that just as a contrast to the other solutions but for the OPs benefit I think it's worth explicitly mentioning that it comes with some baggage.
Thor over 11 years

@EdMorton: Indeed, I meant it as a contrast. I hadn't realized getline had this many potential issues, depending on what the OP is doing this may or may not be a problem. I'll put a warning and reference on the answer. Nice post by the way.
Ed Morton over 11 years

1) Thanks. 2) Yeah, getline is a can of worms. It's very useful when used appropriately, though, much like a hand grenade.
ArigatoManga over 5 years

can you please explain both the sed commands..that will be really informative & helpful !