Extract substring according to regexp with sed or grep
Solution 1
Try this,
sed -nE 's/^pass2:.*<(.*)>.*$/\1/p'
Or POSIXly (-E
has not made it to the POSIX standard yet as of 2019):
sed -n 's/^pass2:.*<\(.*\)>.*$/\1/p'
Output:
$ printf '%s\n' 'pass2: <Marvell Console 1.01> Removable Processor SCSI device' | sed -nE 's/^pass2:.*<(.*)>.*$/\1/p'
Marvell Console 1.01
This will only print the last occurrence of <...>
for each line.
Solution 2
How about -o
under grep to just print the matching part? We still need to remove the <>
, though, but tr
works there.
dmesg |egrep -o "<([a-zA-Z\.0-9 ]+)>" |tr -d "<>"
Marvell Console 1.01
Solution 3
I tried below 3 methods by using sed, awk and python
sed command
echo "pass2: <Marvell Console 1.01> Removable Processor SCSI device" | sed "s/.*<//g"|sed "s/>.*//g"
output
Marvell Console 1.01
awk command
echo "pass2: <Marvell Console 1.01> Removable Processor SCSI device" | awk -F "[<>]" '{print $2}'
output
Marvell Console 1.01
python
#!/usr/bin/python
import re
h=[]
k=open('l.txt','r')
l=k.readlines()
for i in l:
o=i.split(' ')
for i in o[1:4]:
h.append(i)
print (" ".join(h)).replace('>','').replace('<','')
output
Marvell Console 1.01
Related videos on Youtube
Steiner
Updated on September 18, 2022Comments
-
Steiner over 1 year
In a (BSD) UNIX environment, I would like to capture a specific substring using a regular expression.
Assume that the
dmesg
command output would include the following line:pass2: <Marvell Console 1.01> Removable Processor SCSI device
I would like to capture the text between the
<
and>
characters, likedmesg | <sed command>
should output:
Marvell Console 1.01
However, it should not output anything if the regex does not match. Many solutions including
sed -e 's/$regex/\1/
will output the whole input if no match is found, which is not what i want.The corresponding regexp could be:
regex="^pass2\: \<(.*)\>"
How would i properly do a regex match using
sed
orgrep
? Note that thegrep -P
option is unavailable in my BSD UNIX distribution. Thesed -E
option is available, however.-
JdeBP about 5 yearsIt's possibly better to parse the output of
camcontrol devlist
than the output ofdmesg
.
-
-
Steiner about 5 yearsThis works for me, with both the -n parameter and the /p suffix inside the regex. Full command i used:
dmesg | sed -nE 's/^pass2: <(.*)>.*$/\1/p
-
Rich about 5 yearsWhy not use
<([^>]+)>
? I.e. not->
one-or-more times -
jwm about 5 yearsI was thinking the
awk
approach too. Should you constrain your print to lines beginning with "pass2:"? The OP didn't provide sufficient detail, but I can imagine that a naive pattern match would not be quite what was wanted. -
D. Ben Knoble about 5 yearsPython can read from standard in, though perl specializes in this kind of text processing if you’re moving into higher level scripting languages.
-
AdminBee about 4 yearsWelcome to the site, and thank you for your contribution. The reason that
+
doesn't seem to work is that by default,grep
interprets the regular expression as basic regular expression, which doesn't include the+
. You will have to use the-E
option in order to enable them (at least on GNUgrep
), or useegrep
instead.