Extract substring according to regexp with sed or grep

sed grep regular-expression

12,364

Solution 1

Try this,

sed -nE 's/^pass2:.*<(.*)>.*$/\1/p'

Or POSIXly (-E has not made it to the POSIX standard yet as of 2019):

sed -n 's/^pass2:.*<\(.*\)>.*$/\1/p'

Output:

$ printf '%s\n' 'pass2: <Marvell Console 1.01> Removable Processor SCSI device' | sed -nE 's/^pass2:.*<(.*)>.*$/\1/p'
Marvell Console 1.01

This will only print the last occurrence of <...> for each line.

Solution 2

How about -o under grep to just print the matching part? We still need to remove the <>, though, but tr works there.

dmesg |egrep -o "<([a-zA-Z\.0-9 ]+)>" |tr -d "<>"
Marvell Console 1.01

Solution 3

I tried below 3 methods by using sed, awk and python

sed command

echo "pass2: <Marvell Console 1.01> Removable Processor SCSI device" | sed "s/.*<//g"|sed "s/>.*//g"

output

Marvell Console 1.01

awk command

echo "pass2: <Marvell Console 1.01> Removable Processor SCSI device" | awk -F "[<>]" '{print $2}'

output

Marvell Console 1.01

python

#!/usr/bin/python
import re
h=[]
k=open('l.txt','r')
l=k.readlines()
for i in l:
    o=i.split(' ')
    for i in o[1:4]:
        h.append(i)
print (" ".join(h)).replace('>','').replace('<','')

output

Marvell Console 1.01

12,364

Steiner

Updated on September 18, 2022

Comments

Steiner over 1 year
In a (BSD) UNIX environment, I would like to capture a specific substring using a regular expression.

Assume that the dmesg command output would include the following line:
```
pass2: <Marvell Console 1.01> Removable Processor SCSI device
```
I would like to capture the text between the < and > characters, like

dmesg | <sed command>

should output:
```
Marvell Console 1.01
```
However, it should not output anything if the regex does not match. Many solutions including sed -e 's/$regex/\1/ will output the whole input if no match is found, which is not what i want.

The corresponding regexp could be: regex="^pass2\: \<(.*)\>"

How would i properly do a regex match using sed or grep? Note that the grep -P option is unavailable in my BSD UNIX distribution. The sed -E option is available, however.
- JdeBP about 5 years
  
  It's possibly better to parse the output of camcontrol devlist than the output of dmesg.
Steiner about 5 years

This works for me, with both the -n parameter and the /p suffix inside the regex. Full command i used: dmesg | sed -nE 's/^pass2: <(.*)>.*$/\1/p
Rich about 5 years

Why not use <([^>]+)>? I.e. not-> one-or-more times
jwm about 5 years

I was thinking the awk approach too. Should you constrain your print to lines beginning with "pass2:"? The OP didn't provide sufficient detail, but I can imagine that a naive pattern match would not be quite what was wanted.
D. Ben Knoble about 5 years

Python can read from standard in, though perl specializes in this kind of text processing if you’re moving into higher level scripting languages.
AdminBee about 4 years

Welcome to the site, and thank you for your contribution. The reason that + doesn't seem to work is that by default, grep interprets the regular expression as basic regular expression, which doesn't include the +. You will have to use the -E option in order to enable them (at least on GNU grep), or use egrep instead.