Extracting a part of String using grep/sed
33,872
Solution 1
Using awk
awk -F"=|," '{print $2}' file
HP_NetworkSupport
Review users
or
awk -F[=,] '{print $2}' file
HP_NetworkSupport
Review users
Set the delimiter to ,
or =
, then print second field.
To handel field with comma within, you should use a parser for LDAP, but this should work.
echo file
dn: CN=HP_NetworkSupport,OU=groups,DC=HDFCSLDM,DC=COM
dn: CN="Review, users",OU=groups,DC=HDFCSLDM,DC=COM
awk -F"CN=|,OU" '{print $2}' file
HP_NetworkSupport
Review, users
Solution 2
This is one way with lookahead:
grep -Po '(?<=CN=)[^,]*' file > new_file
It gets all text from CN=
(not included) until it finds a comma ,
. The idea of [^,]*
is to fetch any character that is not a comma.
Test
$ grep -Po '(?<=CN=)[^,]*' file
HP_NetworkSupport
Review users
Solution 3
Using sed
:
$ sed -r 's/.*CN=([^,]*),.*/\1/' inputfile
HP_NetworkSupport
Review users
Solution 4
perl -lne 'print $1 if(/CN=([^\,]*),/)' your_file
Tested Below:
> cat temp
dn: CN=HP_NetworkSupport,OU=groups,DC=HDFCSLDM,DC=COM
dn: CN=Review users,OU=groups,DC=HDFCSLDM,DC=COM
> perl -lne 'print $1 if(/CN=([^\,]*),/)' temp
HP_NetworkSupport
Review users
>
Author by
bukubapi
Updated on December 03, 2020Comments
-
bukubapi over 3 years
I have a file in linux with similar entries as below
dn: CN=HP_NetworkSupport,OU=groups,DC=HDFCSLDM,DC=COM dn: CN=Review users,OU=groups,DC=HDFCSLDM,DC=COM
I would like to extract only the CN information, till the first , for ex:
> HP_NetworkSupport > Review users
in the above case to another file.
What would be command for doing the same.
-
Birei over 10 yearsDoes this work for you? I think that
sed
can't use non-greedy quantifiers and parentheses must be escaped to do grouping. -
mvp over 10 yearsUnfortunately, it will fail for cases like
CN="Smith, John",OU="My Organization"
-
fedorqui over 10 yearsYes, but the input is not double quotes surrounded.
-
mvp over 10 yearsWell, this is part of software engineering job - predict what customers would use. And they most likely will
-
fedorqui over 10 yearsI know, but the input and the explanation does not show that. What if the parameters are not in the
CN, OU, DC
order? It would fail on your answer. -
mvp over 10 yearsGranted, for many regex questions correct answer is NOT regex, but proper parser
-
fedorqui over 10 yearsWhat do you mean? I don't understand it.
-
mvp over 10 yearsI mean that this is one nice example that to answer this question 100% correctly one must use LDAP parser. Technically, LDAP CN or OU could include text
CN=blah
(probably quoted) inside of it. How's that? This is similar to premise that you cannot use regex to parse XML. -
fedorqui over 10 yearsYes, I agree :) In fact I wasn't aware that this was a piece of LDAP config file. Let's see if our attempts are enough for the requirements, cheers!
-
Chris Seymour over 10 yearsGood solution is
GNU Grep
is available +1. To fixed @mvp criticism you could do include positive lookbehindgrep -Po '(?<=CN=).*(?=,OU=)' file
-
fedorqui over 10 yearsThanks @sudo_O! I will not update my answer, as the accepted one is not using it, so the OP may have enough with the simple comma checking. Anyway, thanks for teaching again! :)
-
Admin about 10 yearsThank you for providing regular-expressions.info/lookaround.html link.
-
Big McLargeHuge about 5 yearsI'm curious, if you're searching for HTML or XML tags, would you need to escape the characters like this?
grep -Po '(?<=\<string name\=")[^"]*'
-
fedorqui about 5 years@DavidKennedy give it a try ;-) I tried and my GNU grep says it is not necessary:
grep -Po '(?<=<hola>).*(?=</hola>)' <<< "<hola>adeu</hola>"
returns "adeu" without a problem. Of course, you can escape every single character, but this one is not necessary.