Grab nth occurrence in between two patterns using awk or sed

13,894

Solution 1

This might work for you (GNU sed):

'sed -n '/category/{:a;N;/done/!ba;x;s/^/x/;/^x\{3\}$/{x;p;q};x}' file

Turn off automatic printing by using the -n option. Gather up lines between category and done. Store a counter in the hold space and when it reaches 3 print the collection in the pattern space and quit.

Or if you prefer awk:

awk  '/^category/,/^done/{if(++m==1)n++;if(n==3)print;if(/^done/)m=0}'  file

Solution 2

Try doing this :

 awk -v n=3 '/^category/{l++} (l==n){print}' file.txt

Or more cryptic :

awk -v n=3 '/^category/{l++} l==n' file.txt

If your file is big :

awk -v n=3 '/^category/{l++} l>n{exit} l==n' file.txt

Solution 3

If your file doesn't contain any null characters, here's on way using GNU sed. This will find the third occurrence of a pattern range. However, you can easily modify this to get any occurrence you'd like.

sed -n '/^category/ { x; s/^/\x0/; /^\x0\{3\}$/ { x; :a; p; /done/q; n; ba }; x }' file.txt

Results:

category
3
r
d
done

Explanation:

Turn off default printing with the -n switch. Match the word 'category' at the start of a line. Swap the pattern space with the hold space and append a null character to the start of the pattern. In the example, if the pattern then contains two leading null characters, pull the pattern out of holdspace. Now create a loop and print the contents of the pattern space until the last pattern is matched. When this last pattern is found, sed will quit. If it's not found sed will continue to read the next line of input in and continue in its loop.

Solution 4

awk -v tgt=3 '
/^category$/ { fnd=1; rec="" }

fnd {
   rec = rec $0 ORS
   if (/^done$/) {
      if (++cnt == tgt) {
         printf "%s",rec
         exit
      }
      fnd = 0
   }
}
' file
Share:
13,894
Dan Lawless
Author by

Dan Lawless

Updated on June 20, 2022

Comments

  • Dan Lawless
    Dan Lawless almost 2 years

    I have an issue where I want to parse through the output from a file and I want to grab the nth occurrence of text in between two patterns preferably using awk or sed

    category
    1
    s
    t
    done
    category
    2
    n
    d
    done
    category
    3
    r
    d
    done
    category
    4
    t
    h
    done
    

    Let's just say for this example I want to grab the third occurrence of text in between category and done, essentially the output would be

    category
    3
    r
    d
    done
    
  • Dan Lawless
    Dan Lawless over 11 years
    Sorry lets say that the beginning and end are not the same word, I want the third occurrence of what comes in between category and done.
  • Gilles Quenot
    Gilles Quenot over 11 years
    /^category/ mean a string beginning with "category", it's really different than a line containing category. So no need any modification, the script still works AS IS.
  • Ed Morton
    Ed Morton over 11 years
    that will print the text between occurrences of the word category, not between category and done. In the posted input it doesn't matter but in general it could, e.g. if f there can be other text between done and category or occurrences of category without an associated done.
  • Ed Morton
    Ed Morton over 11 years
    sed is an excellent tool for simple substitutions on a single line. For anything else just use awk or you'll find the tiniest requirements change (e.g. print the line numbers too) requires a total re-write of your script, possibly in a different language. Doing anything in sed that requires more than "s" and "g" commands is a waste of time.
  • Ed Morton
    Ed Morton over 11 years
    I'd like it to print the 3rd occurrence but only if the 2nd occurrence contained the word "awk". How would I modify that sed command to do that? In awk I'd simply create a "prevRec" variable to store the previous record and add an if (prevRec ~ /awk/) before the print.
  • Ed Morton
    Ed Morton over 11 years
    That will work with the posted sample input but won't work if there cane be occurrences of category without done or text between done and category.
  • Ed Morton
    Ed Morton over 11 years
    the awk script will keep printing after the 'done" if there's tes=xt between done and the next category. It would also print the wrong block if category can exist without a done. don't know what the sed scripts would do.
  • potong
    potong over 11 years
    @EdMorton I believe printing is narrowed to between category and done, If there is no done this may be what the user requires.
  • Ed Morton
    Ed Morton over 11 years
    Try it with a file with 2 "category" lines before the first "done". It'll print the 2nd category->done block instead of the 3rd.
  • Dan Lawless
    Dan Lawless over 11 years
    I used this awk method awk '/^category/,/^done/{if(/^category/)n++;if(n==3)print}' file Thanks for the response
  • Ed Morton
    Ed Morton over 11 years
    just curious: why? it's testing the same condition multiple times and won't work if your input file changes slightly. If you're happy with a solution that only works with exactly the posted input format, @sputnik's solution is much more concise.
  • Thor
    Thor over 11 years
    @EdMorton: True. One possible fix is to clean up the input first, see edit.
  • Ed Morton
    Ed Morton over 11 years
    It would still fail and print the 2nd record instead of the 3rd one if you added a "category" line before the first "done" in your sample input, e.g. between the "s" and "t" lines.
  • Thor
    Thor over 11 years
    @EdMorton: Right, I see your point, the ending pattern is not searched for. I've added a getline alternative that does search for done.
  • Ed Morton
    Ed Morton over 11 years
    IMHO the non-getline version I posted is simpler and it doesn't have all the getline caveats (see awk.info/?tip/getline). I expect you posted that just as a contrast to the other solutions but for the OPs benefit I think it's worth explicitly mentioning that it comes with some baggage.
  • Thor
    Thor over 11 years
    @EdMorton: Indeed, I meant it as a contrast. I hadn't realized getline had this many potential issues, depending on what the OP is doing this may or may not be a problem. I'll put a warning and reference on the answer. Nice post by the way.
  • Ed Morton
    Ed Morton over 11 years
    1) Thanks. 2) Yeah, getline is a can of worms. It's very useful when used appropriately, though, much like a hand grenade.
  • ArigatoManga
    ArigatoManga over 5 years
    can you please explain both the sed commands..that will be really informative & helpful !