grep: group capturing

100,543

Solution 1

This might work for you:

echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' |
sed -n 's/.*"scheme_version":\([^}]*\)}/\1/p'
1234

Sorry it's not grep, so disregard this solution if you like.

Or stick with grep and add:

grep -Eo "\"scheme_version\":(\w*)"| cut -d: -f2

Solution 2

You'll need to use a look behind assertion so that it isn't included in the match:

grep -Po '(?<=scheme_version":)[0-9]+'

Solution 3

I would recommend that you use jq for the job. jq is a command-line JSON processor.

$ cat tmp
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}

$ cat tmp | jq .scheme_version
1234

Solution 4

As an alternative to the positive lookbehind method suggested by SiegeX, you can reset the match starting point to directly after scheme_version": with the \K escape sequence. E.g.,

$ grep -Po 'scheme_version":\K[0-9]+'

This restarts the matching process after having matched scheme_version":, and tends to have far better performance than the positive lookbehind. Comparing the two on regexp101 demonstrates that the reset match start method takes 37 steps and 1ms, while the positive lookbehind method takes 194 steps and 21ms.

You can compare the performance yourself on regex101 and you can read more about resetting the match starting point in the PCRE documentation.

Solution 5

To avoid using greps PCRE feature which is available in GNU grep, but not in BSD version, another method is to use ripgrep, e.g.

$ rg -o 'scheme_version.?:(\d+)' -r '$1' <file.json 
1234

-r Capture group indices (e.g., $5) and names (e.g., $foo).

Another example with Python and json.tool module which can validate and pretty-print:

$ python -mjson.tool file.json | rg -o 'scheme_version[^\d]+(\d+)' -r '$1'
1234

Related: Can grep output only specified groupings that match?

Share:
100,543
lstipakov
Author by

lstipakov

Updated on July 09, 2022

Comments

  • lstipakov
    lstipakov almost 2 years

    I have following string:

    {"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
    

    and I need to get value of "scheme version", which is 1234 in this example.

    I have tried

    grep -Eo "\"scheme_version\":(\w*)"
    

    however it returns

    "scheme_version":1234
    

    How can I make it? I know I can add sed call, but I would prefer to do it with single grep.

  • lstipakov
    lstipakov over 12 years
    Hmm I got grep: Support for the -P option is not compiled into this --disable-perl-regexp binary
  • SiegeX
    SiegeX over 12 years
    @Stipa Without PCRE support you cannot do what you want with grep as it does not support backreferences i.e. \1
  • greuze
    greuze over 7 years
    Exactly what was asked, worked as a charm that "positive lookbehind"
  • TommyAutoMagically
    TommyAutoMagically about 7 years
    Better than the accepted answer by a long shot for those of us lucky enough to have -P support already compiled in (or stubborn enough to rebuild grep...) :)
  • asgs
    asgs about 7 years
    When you've multiple named groups, each of them is output in a new line. is there a way to print it on the same line? e.g. cat ~/mydoc | grep -Po '(?<=blah">)[^<]*|(?<=bleh"></span>)[^<]*' prints the captures in different lines.
  • Kristi Jorgji
    Kristi Jorgji about 2 years
    hi from 2022 this does not work on mac because grep -E flag is not valid in damn macs
  • Jean-François Fabre
    Jean-François Fabre about 2 years
    strange as -E flag is documented here: ss64.com/osx/grep.html. You can also try to see if "egrep" is available.