Extract substring using regular expression on a Unix file
22,276
Solution 1
Gnu grep
grep -oE '[[:alpha:]]+_[[:digit:]]+_[[:alpha:]]+_[[:digit:]]+'
Use the perl-regex flag and look-behind and look-ahead assertions to guarantee that the match is surrounded by /
grep -oP '(?<=/)[[:alpha:]]+_[[:digit:]]+_[[:alpha:]]+_[[:digit:]]+(?=/)'
Solution 2
IMHO Perl offers the easiest and the most flexible solution:
perl -nE 'say $1 if m{/(\w+\d+\w+\d+)/};' input_file
Please note that input_file
is optional: STDIN
will be filtered if/when input file name is not given.
Solution 3
One way with awk
:
awk -F/ '{for(i=1;i<=NF;i++)$0=($i~/_/)?$i:$0}1' file
Related videos on Youtube
Comments
-
g4ur4v over 1 year
I have file with below contents .
/ABC/RTE/AD_900_VOP_123/OPP /ABC/RTE/TRE/AD_900_VOP_145/BBB /ABC/RTE/AN_900_VFP_124/FBF /ABC/RTE/HD_900_FOP_153/WEW /ABD/RDV/AD_900_VOP_123/OPP /ABC/RTE/WD_900_VOP_123/GRR/TRD /ABC/RTE/RTD/AR_900_VOP_443/SDD
How can I use regular expression on this file such that I get the output such as
AD_900_VOP_123 AD_900_VOP_145 AN_900_VFP_124 HD_900_FOP_153 AD_900_VOP_123 WD_900_VOP_123 AR_900_VOP_443
-
Admin almost 11 yearsWhat is the criterion for picking the field of interest?
-
Admin almost 11 yearscriteria is any pattern like
<alphabets>_<digits>_<alphabets>_<digits>
and fall between two/
-
Admin almost 11 yearsawk -F/ '{print $3}'
-
Admin over 9 years
awk -F/ '{print $(NF-1)}'
to find last dir (if those are dirs)
-
-
g4ur4v almost 11 yearscan you please explain it in one or two lines
-
g4ur4v almost 11 yearsHi,I just ran it ,but I get the entire input as the result
$ sed 's|.*/\([0-9_A-Z]\+900[0-9_A-Z]\+\)/.*|\1|' tstfile.tx
t/ABC/RTE/AD_900_VOP_123/OPP
/ABC/RTE/TRE/AD_900_VOP_145/BBB
/ABC/RTE/AN_900_VFP_124/FBF
/ABC/RTE/HD_900_FOP_153/WEW
/ABD/RDV/AD_900_VOP_123/OPP
/ABC/RTE/WD_900_VOP_123/GRR/TRD
/ABC/RTE/RTD/AR_900_VOP_443/SDD
-
g4ur4v almost 11 yearsNo ,I am not :)
-
g4ur4v almost 11 yearsdid you run it ?
-
slm almost 11 years@g4ur4v - Sorry I had to ask 8-). What version of sed are you using? I just ran what you sent me and it worked just fine. You can use this command:
sed --version
GNU sed version 4.2.1. -
g4ur4v almost 11 yearsI am using mobaxterm on windows may be thats why I am not getting the desire result.
$ sed --version
This is not GNU sed version 4.0
-
slm almost 11 years@g4ur4v - Ah that makes more sense. MobaXterm doesn't include a 4.x GNU version of
sed
. I've updated your question to include a new tag for MobaXterm so that others are aware that you're using it - and that the Q&A are specific to that. -
Johan over 9 yearsA variation on this which is slightly longer but 100 times easier to read (and write!) is
sed 's|.*/\(.._..._..._...\)/.*|\1|' <input
-
mikeserv over 9 years@Johan - it is also far less capable - your version strictly delimits each field, mine will work with fields of any length. And I don't consider it easier to read or write.
-
mikeserv over 9 yearsUsing
.
like that in ag
lobal is usually looking for trouble. What if one of the fields winds up being only a single char? That field (and one or two that follow) goes poof.sed 's|/[^/_]\{3\}||g'
would at least serve to ensure that you don't remove anything you shouldn't, though in some cases might result in your not removing something you should, which is usually the better alternative, as I consider it. -
Johan over 9 years@mikeserv It handles the sample data provided, not all possible types of data.