Extracting a part of String using grep/sed

33,872

Solution 1

Using awk

awk -F"=|," '{print $2}' file
HP_NetworkSupport
Review users

or

awk -F[=,] '{print $2}' file
HP_NetworkSupport
Review users

Set the delimiter to , or =, then print second field.


To handel field with comma within, you should use a parser for LDAP, but this should work.

echo file
dn: CN=HP_NetworkSupport,OU=groups,DC=HDFCSLDM,DC=COM
dn: CN="Review, users",OU=groups,DC=HDFCSLDM,DC=COM

awk -F"CN=|,OU" '{print $2}' file
HP_NetworkSupport
Review, users

Solution 2

This is one way with lookahead:

grep -Po '(?<=CN=)[^,]*' file > new_file

It gets all text from CN= (not included) until it finds a comma ,. The idea of [^,]* is to fetch any character that is not a comma.

Test

$ grep -Po '(?<=CN=)[^,]*' file
HP_NetworkSupport
Review users

Solution 3

Using sed:

$ sed -r 's/.*CN=([^,]*),.*/\1/' inputfile
HP_NetworkSupport
Review users

Solution 4

perl -lne 'print $1 if(/CN=([^\,]*),/)' your_file

Tested Below:

> cat temp
dn: CN=HP_NetworkSupport,OU=groups,DC=HDFCSLDM,DC=COM
dn: CN=Review users,OU=groups,DC=HDFCSLDM,DC=COM
> perl -lne 'print $1 if(/CN=([^\,]*),/)' temp
HP_NetworkSupport
Review users
>
Share:
33,872
bukubapi
Author by

bukubapi

Updated on December 03, 2020

Comments

  • bukubapi
    bukubapi over 3 years

    I have a file in linux with similar entries as below

    dn: CN=HP_NetworkSupport,OU=groups,DC=HDFCSLDM,DC=COM
    dn: CN=Review users,OU=groups,DC=HDFCSLDM,DC=COM
    

    I would like to extract only the CN information, till the first , for ex:

    > HP_NetworkSupport
    > Review users
    

    in the above case to another file.

    What would be command for doing the same.

  • Birei
    Birei over 10 years
    Does this work for you? I think that sed can't use non-greedy quantifiers and parentheses must be escaped to do grouping.
  • mvp
    mvp over 10 years
    Unfortunately, it will fail for cases like CN="Smith, John",OU="My Organization"
  • fedorqui
    fedorqui over 10 years
    Yes, but the input is not double quotes surrounded.
  • mvp
    mvp over 10 years
    Well, this is part of software engineering job - predict what customers would use. And they most likely will
  • fedorqui
    fedorqui over 10 years
    I know, but the input and the explanation does not show that. What if the parameters are not in the CN, OU, DC order? It would fail on your answer.
  • mvp
    mvp over 10 years
    Granted, for many regex questions correct answer is NOT regex, but proper parser
  • fedorqui
    fedorqui over 10 years
    What do you mean? I don't understand it.
  • mvp
    mvp over 10 years
    I mean that this is one nice example that to answer this question 100% correctly one must use LDAP parser. Technically, LDAP CN or OU could include text CN=blah (probably quoted) inside of it. How's that? This is similar to premise that you cannot use regex to parse XML.
  • fedorqui
    fedorqui over 10 years
    Yes, I agree :) In fact I wasn't aware that this was a piece of LDAP config file. Let's see if our attempts are enough for the requirements, cheers!
  • Chris Seymour
    Chris Seymour over 10 years
    Good solution is GNU Grep is available +1. To fixed @mvp criticism you could do include positive lookbehind grep -Po '(?<=CN=).*(?=,OU=)' file
  • fedorqui
    fedorqui over 10 years
    Thanks @sudo_O! I will not update my answer, as the accepted one is not using it, so the OP may have enough with the simple comma checking. Anyway, thanks for teaching again! :)
  • Admin
    Admin about 10 years
    Thank you for providing regular-expressions.info/lookaround.html link.
  • Big McLargeHuge
    Big McLargeHuge about 5 years
    I'm curious, if you're searching for HTML or XML tags, would you need to escape the characters like this? grep -Po '(?<=\<string name\=")[^"]*'
  • fedorqui
    fedorqui about 5 years
    @DavidKennedy give it a try ;-) I tried and my GNU grep says it is not necessary: grep -Po '(?<=<hola>).*(?=</hola>)' <<< "<hola>adeu</hola>" returns "adeu" without a problem. Of course, you can escape every single character, but this one is not necessary.