extract url between 2 strings in a file

7,358

Solution 1

You could use this

sed -n 's!^.*\^"\(http[^^]*\)"^.*!\1!p'

The potential gotcha for a beginner to REs is that ^ is an indicator for start of line, so you have to ensure you escape it \^ if you want a literal up arrow at the start of your RE.

The RE pattern match can be explained as follows

  • ^.*\^" -- Match from start of line until we see the last possible up-arrow double-quote ^" that satisfies the rest of the pattern
  • \( -- Start a substitution block that can be substituted as \1
  • http[^^]* -- Match http followed by as many characters that are not ^ as possible
  • \) -- End the substitution block "^.* -- Match double-quote and up-arrow "^, then as much as possible (until end of line)

This entire match is replaced by \1, which is the pattern block starting http

Solution 2

If your version of grep supports PCRE mode, you could try

grep -Po '(?<="\^")http.+?(?="\^")'

Solution 3

Try this:

echo "372"^""^"2015-09-03 06:59:44.475"^"NEW"^"N/A"^""^0^"105592"^"https://example-url.com"^"example-domain < MEN'S ULTRA < UltraSeriesViewAll (18)"^"New"^"MERCHANT_PROVIDED" | cut -f9 -d^
Share:
7,358
Chris Illssbilsworth
Author by

Chris Illssbilsworth

Updated on September 18, 2022

Comments

  • Chris Illssbilsworth
    Chris Illssbilsworth over 1 year

    I have a file in which each line is like this

    "372"^""^"2015-09-03 06:59:44.475"^"NEW"^"N/A"^""^0^"105592"^"https://example-url.com"^"example-domain < MEN'S ULTRA < UltraSeriesViewAll (18)"^"New"^"MERCHANT_PROVIDED"
    

    I want to extract the urls in the file -- https://example-url.com

    I tried these regex using sed command -- sed -n '/"^"http/,/"^"/p'

    But it didn't solve my problem.

  • Centimane
    Centimane over 8 years
    I think you meant to wrap your echo command in single quotes '. Otherwise you lose the double quotes when you echo.
  • SHW
    SHW over 8 years
    Given statement is also equipped with single quote '. Hence I avoided that.
  • Centimane
    Centimane over 8 years
    But without the single quote fields like f 10 become arrays. It doesn't impact the url itself really, but it's easy enough to echo the result of this to get rid of the double quotes if needed.
  • roaima
    roaima over 8 years
    The cut is useful but doesn't remove the double quotes surrounding the extracted field. tr -d '"' maybe?