How to get group results using grep?
Assuming that the pattern in pattern.txt
is
(.*)(\d+)(.*)
then, using it with GNU grep
would be a matter of
grep -E -f pattern.txt line.txt
i.e., search in line.txt
for lines matching any of the extended regular expressions listed in pattern.txt
, which, given the data in the question, produces
This order was placed for QT3000! OK?
The issue with your command was that you used -e -f
. The -e
option is used for explicitly saying "the next argument is the expression". This means that -e -f
will be interpreted as "the regular expression to use is -f
". You then applied this in searching for matches in both the files mentioned on the command line.
A secondary issue was the \\d
in the pattern.txt
file, which matches a backslash followed by the character d
, i.e. the literal string \d
.
The pattern has a few other "issues". It first of all uses a non-standard expression to match a digit, \d
. This is better written as [[:digit:]]
or as the range [0-9]
(in the POSIX standard locale). Since regular expressions matches on substrings, as opposed to filename globbing patterns which are always automatically anchored, neither of the .*
bits of the pattern is needed. Likewise, the parentheses are not needed at all as they serve no function in the pattern. The +
isn't needed either as a single digit would be matched by the preceding expression (a single digit is "one or more digits").
This means that to extract all lines that contains (at least) one digit, you may instead use the pattern [[:digit:]]
or [0-9]
, or \d
if you want to keep using Perl-like expressions with GNU grep
, with no other decorations. For the difference between these, please see Difference between [0-9], [[:digit:]] and \d.
To get the three different outputs that you show in the question, use sed
rather than grep
. You want to use sed
because grep
can only print matching lines (or words), but not really modify the data matched.
-
Insert
Found value:
in front of any line containing a digit, and print those lines:$ sed -n '/[[:digit:]]/s/^/Found value: /p' line.txt Found value: This order was placed for QT3000! OK?
-
Insert
Found value:
in front of any line containing a digit, and print those lines up to the end of the 3rd digit found (or to at most the 3rd digit; may output fewer digits at the end if there are fewer consecutive digits in the first substring of digits on the line):$ sed -n '/[[:digit:]]/s/\([^[:digit:]]*[[:digit:]]\{1,3\}\).*/Found value: \1/p' line.txt Found value: This order was placed for QT300
-
Insert
Found value:
in front of any line containing a digit, and print the last digit from the line:$ sed -n '/[[:digit:]]/s/.*\([[:digit:]]\).*/Found value: \1/p' line.txt Found value: 0
Using an equivalent regular expression as you used, we can see what bits of the text it matches:
$ sed 's/\(.*\)\([[:digit:]]\{1,\}\)\(.*\)/(\1)(\2)(\3)/' line.txt
(This order was placed for QT300)(0)(! OK?)
Note that \2
only matches the last digit on the line as the preceding .*
is greedy.
Nicholas Saunders
Updated on September 18, 2022Comments
-
Nicholas Saunders over 1 year
How would I get this output:
Found value: This order was placed for QT3000! OK?
or
Found value: This order was placed for QT300
or
Found value: 0
using
line.txt
andpattern.txt
as below:[nsaunders@rolly regex]$ [nsaunders@rolly regex]$ grep -e -f pattern.txt line.txt [nsaunders@rolly regex]$ [nsaunders@rolly regex]$ cat pattern.txt (.*)(\\d+)(.*) [nsaunders@rolly regex]$ [nsaunders@rolly regex]$ cat line.txt This order was placed for QT3000! OK? [nsaunders@rolly regex]$
utilizing something similar to
m.group(0)
from a tutorial on regex.Perhaps
grep
doesn't have such notion as:Groups and capturing Group number Capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C))), for example, there are four such groups: 1 ((A)(B(C))) 2 (A) 3 (B(C)) 4 (C) Group zero always stands for the entire expression. Capturing groups are so named because, during a match, each subsequence of the input sequence that matches such a group is saved. The captured subsequence may be used later in the expression, via a back reference, and may also be retrieved from the matcher once the match operation is complete.
-
Sundeep almost 4 years
-
Nicholas Saunders almost 4 yearsI'm using the
-e
switch @Sundeep, but is that not sufficient? Perhaps you would elaborate a bit, and thanks for the link. -
Sundeep almost 4 yearscould you explain how does one line
This order was placed for QT3000! OK?
translates to three lines of output? you need-E
switch for()
to act as capture groups..\d
is not supported by grep (unless you have GNU grep which has PCRE support) -
Nicholas Saunders almost 4 yearsoh, pardon, what I mean is generate each of those three lines using some notion of
group(x)
withgrep
. thanks, I updated the question. I'm not sure how\d
factors in here. but, yes, I'm asking about capture groups. Hmm, I'm looking into-e
versus-E
now, thanks... -
Sundeep almost 4 yearsdo you want to print all lines containing a digit character?
grep '[0-9]' line.txt
? -
Sundeep almost 4 yearsif you want only the digits,
grep -oE '[0-9]+'
(provided you grep supports-o
option)
-