Extract substring using regular expression on a Unix file

sed grep awk regular-expression

22,276

Solution 1

Gnu grep

grep -oE '[[:alpha:]]+_[[:digit:]]+_[[:alpha:]]+_[[:digit:]]+'

Use the perl-regex flag and look-behind and look-ahead assertions to guarantee that the match is surrounded by /

grep -oP '(?<=/)[[:alpha:]]+_[[:digit:]]+_[[:alpha:]]+_[[:digit:]]+(?=/)'

Solution 2

IMHO Perl offers the easiest and the most flexible solution:

perl -nE 'say $1 if m{/(\w+\d+\w+\d+)/};' input_file

Please note that input_file is optional: STDIN will be filtered if/when input file name is not given.

Solution 3

One way with awk:

awk -F/ '{for(i=1;i<=NF;i++)$0=($i~/_/)?$i:$0}1' file

22,276

g4ur4v

N00B.

Updated on September 18, 2022

Comments

g4ur4v over 1 year
I have file with below contents .
```
/ABC/RTE/AD_900_VOP_123/OPP
/ABC/RTE/TRE/AD_900_VOP_145/BBB
/ABC/RTE/AN_900_VFP_124/FBF
/ABC/RTE/HD_900_FOP_153/WEW
/ABD/RDV/AD_900_VOP_123/OPP
/ABC/RTE/WD_900_VOP_123/GRR/TRD
/ABC/RTE/RTD/AR_900_VOP_443/SDD
```
How can I use regular expression on this file such that I get the output such as
```
AD_900_VOP_123
AD_900_VOP_145
AN_900_VFP_124
HD_900_FOP_153
AD_900_VOP_123
WD_900_VOP_123
AR_900_VOP_443
```
- Admin almost 11 years
  
  What is the criterion for picking the field of interest?
- Admin almost 11 years
  
  criteria is any pattern like <alphabets>_<digits>_<alphabets>_<digits> and fall between two /
- Admin almost 11 years
  
  awk -F/ '{print $3}'
- Admin over 9 years
  
  awk -F/ '{print $(NF-1)}' to find last dir (if those are dirs)
g4ur4v almost 11 years

can you please explain it in one or two lines
g4ur4v almost 11 years

Hi,I just ran it ,but I get the entire input as the result $ sed 's|.*/$[0-9_A-Z]\+900[0-9_A-Z]\+$/.*|\1|' tstfile.txt /ABC/RTE/AD_900_VOP_123/OPP /ABC/RTE/TRE/AD_900_VOP_145/BBB /ABC/RTE/AN_900_VFP_124/FBF /ABC/RTE/HD_900_FOP_153/WEW /ABD/RDV/AD_900_VOP_123/OPP /ABC/RTE/WD_900_VOP_123/GRR/TRD /ABC/RTE/RTD/AR_900_VOP_443/SDD
g4ur4v almost 11 years

No ,I am not :)
g4ur4v almost 11 years

did you run it ?
slm almost 11 years

@g4ur4v - Sorry I had to ask 8-). What version of sed are you using? I just ran what you sent me and it worked just fine. You can use this command: sed --version GNU sed version 4.2.1.
g4ur4v almost 11 years

I am using mobaxterm on windows may be thats why I am not getting the desire result. $ sed --version This is not GNU sed version 4.0
slm almost 11 years

@g4ur4v - Ah that makes more sense. MobaXterm doesn't include a 4.x GNU version of sed. I've updated your question to include a new tag for MobaXterm so that others are aware that you're using it - and that the Q&A are specific to that.
Johan over 9 years

A variation on this which is slightly longer but 100 times easier to read (and write!) is sed 's|.*/$.._..._..._...$/.*|\1|' <input
mikeserv over 9 years

@Johan - it is also far less capable - your version strictly delimits each field, mine will work with fields of any length. And I don't consider it easier to read or write.
mikeserv over 9 years

Using . like that in a global is usually looking for trouble. What if one of the fields winds up being only a single char? That field (and one or two that follow) goes poof. sed 's|/[^/_]\{3\}||g' would at least serve to ensure that you don't remove anything you shouldn't, though in some cases might result in your not removing something you should, which is usually the better alternative, as I consider it.
Johan over 9 years

@mikeserv It handles the sample data provided, not all possible types of data.