Extracting a part of String using grep/sed

regex sed grep

33,872

Solution 1

Using awk

awk -F"=|," '{print $2}' file
HP_NetworkSupport
Review users

awk -F[=,] '{print $2}' file
HP_NetworkSupport
Review users

Set the delimiter to , or =, then print second field.

To handel field with comma within, you should use a parser for LDAP, but this should work.

echo file
dn: CN=HP_NetworkSupport,OU=groups,DC=HDFCSLDM,DC=COM
dn: CN="Review, users",OU=groups,DC=HDFCSLDM,DC=COM

awk -F"CN=|,OU" '{print $2}' file
HP_NetworkSupport
Review, users

Solution 2

This is one way with lookahead:

grep -Po '(?<=CN=)[^,]*' file > new_file

It gets all text from CN= (not included) until it finds a comma ,. The idea of [^,]* is to fetch any character that is not a comma.

Test

$ grep -Po '(?<=CN=)[^,]*' file
HP_NetworkSupport
Review users

Solution 3

Using sed:

$ sed -r 's/.*CN=([^,]*),.*/\1/' inputfile
HP_NetworkSupport
Review users

Solution 4

perl -lne 'print $1 if(/CN=([^\,]*),/)' your_file

Tested Below:

> cat temp
dn: CN=HP_NetworkSupport,OU=groups,DC=HDFCSLDM,DC=COM
dn: CN=Review users,OU=groups,DC=HDFCSLDM,DC=COM
> perl -lne 'print $1 if(/CN=([^\,]*),/)' temp
HP_NetworkSupport
Review users
>

View more solutions

33,872

Author by

bukubapi

Updated on December 03, 2020

Comments

bukubapi over 3 years
I have a file in linux with similar entries as below
```
dn: CN=HP_NetworkSupport,OU=groups,DC=HDFCSLDM,DC=COM
dn: CN=Review users,OU=groups,DC=HDFCSLDM,DC=COM
```
I would like to extract only the CN information, till the first , for ex:
```
> HP_NetworkSupport
> Review users
```
in the above case to another file.

What would be command for doing the same.
Birei over 10 years

Does this work for you? I think that sed can't use non-greedy quantifiers and parentheses must be escaped to do grouping.
mvp over 10 years

Unfortunately, it will fail for cases like CN="Smith, John",OU="My Organization"
fedorqui over 10 years

Yes, but the input is not double quotes surrounded.
mvp over 10 years

Well, this is part of software engineering job - predict what customers would use. And they most likely will
fedorqui over 10 years

I know, but the input and the explanation does not show that. What if the parameters are not in the CN, OU, DC order? It would fail on your answer.
mvp over 10 years

Granted, for many regex questions correct answer is NOT regex, but proper parser
fedorqui over 10 years

What do you mean? I don't understand it.
mvp over 10 years

I mean that this is one nice example that to answer this question 100% correctly one must use LDAP parser. Technically, LDAP CN or OU could include text CN=blah (probably quoted) inside of it. How's that? This is similar to premise that you cannot use regex to parse XML.
fedorqui over 10 years

Yes, I agree :) In fact I wasn't aware that this was a piece of LDAP config file. Let's see if our attempts are enough for the requirements, cheers!
Chris Seymour over 10 years

Good solution is GNU Grep is available +1. To fixed @mvp criticism you could do include positive lookbehind grep -Po '(?<=CN=).*(?=,OU=)' file
fedorqui over 10 years

Thanks @sudo_O! I will not update my answer, as the accepted one is not using it, so the OP may have enough with the simple comma checking. Anyway, thanks for teaching again! :)
Admin about 10 years

Thank you for providing regular-expressions.info/lookaround.html link.
Big McLargeHuge about 5 years

I'm curious, if you're searching for HTML or XML tags, would you need to escape the characters like this? grep -Po '(?<=\<string name\=")[^"]*'
fedorqui about 5 years

@DavidKennedy give it a try ;-) I tried and my GNU grep says it is not necessary: grep -Po '(?<=<hola>).*(?=</hola>)' <<< "<hola>adeu</hola>" returns "adeu" without a problem. Of course, you can escape every single character, but this one is not necessary.