Using regular expression to extract content of file

33,800

Solution 1

$ echo "www.blablabla.com" | grep -oP '(?<=\.)[a-zA-Z0-9\.-]*(?=\.)' 
blablabla

-o -- print only matched parts of matching line

-P -- Use Perl regex

(?<=\.) -- after a literal ., aka, a "positive look-behind" ...

[a-zA-Z0-9\.-]* -- match zero or more instances of lower & upper case characters, numbers 0-9, literal . and hyphen ...

(?=\.) -- followed by a literal ., aka a "positive look-ahead"

See this link for more on look arounds. Tools like https://regex101.com/ can help you break down your regular expressions.

Solution 2

sed solution:

$ str='Hellowww.hello.comMywww.world.comWorld'

$ echo "$str" | sed -e 's/com/com\n/g' | sed -ne '/.*www\.\(.*\)\.com.*/{ s//\1/p }'
hello
world
Share:
33,800

Related videos on Youtube

pnom
Author by

pnom

Updated on September 18, 2022

Comments

  • pnom
    pnom over 1 year

    I have a link and I would like to return only content between www. and .com

    e.g www.blablabla.com would return only blablabla

    How could I do that? When I use grep '\.[a-zA-Z0-9\.-]*\.' it gives me .blablabla.

    • Admin
      Admin about 8 years
      awk -F. '{print $2}'
    • Admin
      Admin about 8 years
      Homework problem?
  • pnom
    pnom about 8 years
    Thanks that's what i wanted but what does it do ? Could u explain it a bit more please? Also -P uses Perl regular expression is there any way to do it just with grep regular expressions?
  • KM.
    KM. about 8 years
    Not that I know of
  • GMaster
    GMaster about 8 years
    If you don't want to use -P there is no way you can do this using grep re alone. If you want to stick with grep consider using tr to drop the . like this echo 'www.blablabla.com' | grep -o '\.[a-zA-Z0-9\.-]*\.' | tr -d .