Using regular expression to extract content of file

bash shell scripting grep

33,800

Solution 1

$ echo "www.blablabla.com" | grep -oP '(?<=\.)[a-zA-Z0-9\.-]*(?=\.)' 
blablabla

-o -- print only matched parts of matching line

-P -- Use Perl regex

(?<=\.) -- after a literal ., aka, a "positive look-behind" ...

[a-zA-Z0-9\.-]* -- match zero or more instances of lower & upper case characters, numbers 0-9, literal . and hyphen ...

(?=\.) -- followed by a literal ., aka a "positive look-ahead"

See this link for more on look arounds. Tools like https://regex101.com/ can help you break down your regular expressions.

sed solution:

$ str='Hellowww.hello.comMywww.world.comWorld'

$ echo "$str" | sed -e 's/com/com\n/g' | sed -ne '/.*www\.\(.*\)\.com.*/{ s//\1/p }'
hello
world

33,800

Updated on September 18, 2022

pnom almost 2 years

I have a link and I would like to return only content between www. and .com

e.g www.blablabla.com would return only blablabla

How could I do that? When I use grep '\.[a-zA-Z0-9\.-]*\.' it gives me .blablabla.
- Admin about 8 years
  
  awk -F. '{print $2}'
- Admin about 8 years
  
  Homework problem?
pnom about 8 years

Thanks that's what i wanted but what does it do ? Could u explain it a bit more please? Also -P uses Perl regular expression is there any way to do it just with grep regular expressions?
KM. about 8 years

Not that I know of
GMaster about 8 years

If you don't want to use -P there is no way you can do this using grep re alone. If you want to stick with grep consider using tr to drop the . like this echo 'www.blablabla.com' | grep -o '\.[a-zA-Z0-9\.-]*\.' | tr -d .