extract url between 2 strings in a file
7,358
Solution 1
You could use this
sed -n 's!^.*\^"\(http[^^]*\)"^.*!\1!p'
The potential gotcha for a beginner to REs is that ^
is an indicator for start of line, so you have to ensure you escape it \^
if you want a literal up arrow at the start of your RE.
The RE pattern match can be explained as follows
^.*\^"
-- Match from start of line until we see the last possible up-arrow double-quote^"
that satisfies the rest of the pattern\(
-- Start a substitution block that can be substituted as\1
http[^^]*
-- Matchhttp
followed by as many characters that are not^
as possible\)
-- End the substitution block"^.*
-- Match double-quote and up-arrow"^
, then as much as possible (until end of line)
This entire match is replaced by \1
, which is the pattern block starting http
Solution 2
If your version of grep supports PCRE mode, you could try
grep -Po '(?<="\^")http.+?(?="\^")'
Solution 3
Try this:
echo "372"^""^"2015-09-03 06:59:44.475"^"NEW"^"N/A"^""^0^"105592"^"https://example-url.com"^"example-domain < MEN'S ULTRA < UltraSeriesViewAll (18)"^"New"^"MERCHANT_PROVIDED" | cut -f9 -d^
Author by
Chris Illssbilsworth
Updated on September 18, 2022Comments
-
Chris Illssbilsworth over 1 year
I have a file in which each line is like this
"372"^""^"2015-09-03 06:59:44.475"^"NEW"^"N/A"^""^0^"105592"^"https://example-url.com"^"example-domain < MEN'S ULTRA < UltraSeriesViewAll (18)"^"New"^"MERCHANT_PROVIDED"
I want to extract the urls in the file --
https://example-url.com
I tried these regex using sed command --
sed -n '/"^"http/,/"^"/p'
But it didn't solve my problem.
-
Centimane over 8 yearsI think you meant to wrap your echo command in single quotes
'
. Otherwise you lose the double quotes when you echo. -
SHW over 8 yearsGiven statement is also equipped with single quote
'
. Hence I avoided that. -
Centimane over 8 yearsBut without the single quote fields like f 10 become arrays. It doesn't impact the url itself really, but it's easy enough to echo the result of this to get rid of the double quotes if needed.
-
roaima over 8 yearsThe
cut
is useful but doesn't remove the double quotes surrounding the extracted field.tr -d '"'
maybe?