Sed to extract text between two strings

74,059

Solution 1

sed -n '/^START=A$/,/^END$/p' data

The -n option means don't print by default; then the script says 'do print between the line containing START=A and the next END.

You can also do it with awk:

A pattern may consist of two patterns separated by a comma; in this case, the action is performed for all lines from an occurrence of the first pattern though an occurrence of the second.

(from man awk on Mac OS X).

awk '/^START=A$/,/^END$/ { print }' data

Given a modified form of the data file in the question:

START=A
  xxx01
  xxx02
END
START=A
  xxx03
  xxx04
END
START=A
  xxx05
  xxx06
END
START=B
  xxx07
  xxx08
END
START=A
  xxx09
  xxx10
END
START=C
  xxx11
  xxx12
END
START=A
  xxx13
  xxx14
END
START=D
  xxx15
  xxx16
END

The output using GNU sed or Mac OS X (BSD) sed, and using GNU awk or BSD awk, is the same:

START=A
  xxx01
  xxx02
END
START=A
  xxx03
  xxx04
END
START=A
  xxx05
  xxx06
END
START=A
  xxx09
  xxx10
END
START=A
  xxx13
  xxx14
END

Note how I modified the data file so it is easier to see where the various blocks of data printed came from in the file.

If you have a different output requirement (such as 'only the first block between START=A and END', or 'only the last ...'), then you need to articulate that more clearly in the question.

Solution 2

Basic version ...

sed -n '/START=A/,/END/p' yourfile

More robust version...

sed -n '/^ *START=A *$/,/^ *END *$/p' yourfile

Solution 3

Your sed expression has a space before end, i.e / ^END/. So sed gets the starting pattern, but does not get the ending pattern and keeps on printing till end. Use sed '/^START=A/, /^END/!d' input_file (notice /^END/)

Share:
74,059

Related videos on Youtube

ranganath111
Author by

ranganath111

I need & share C++ Now Java also please

Updated on July 16, 2022

Comments

  • ranganath111
    ranganath111 almost 2 years

    Please help me in using sed. I have a file like below.

    START=A
      xxxxx
      xxxxx
    END
    START=A
      xxxxx
      xxxxx
    END
    START=A
      xxxxx
      xxxxx
    END
    START=B
      xxxxx
      xxxxx
    END
    START=A
      xxxxx
      xxxxx
    END
    START=C
      xxxxx
      xxxxx
    END
    START=A
      xxxxx
      xxxxx
    END
    START=D
      xxxxx
      xxxxx
    END
    

    I want to get the text between START=A, END. I used the below query.

    sed '/^START=A/, / ^END/!d' input_file
    

    The problem here is , I am getting

    START=A
      xxxxx
      xxxxx
    END
    START=D
      xxxxx
      xxxxx
    END
    

    instead of

    START=A
      xxxxx
      xxxxx
    END
    

    Sed finds greedily.

    Please help me in resolvng this.

    Thanks in advance.

    Can I use AWK for achieving above?

  • ranganath111
    ranganath111 almost 11 years
    Thanks for the reply. I need text between START=A and the next END, the above one gives data between START=A and last END. Hope you got my prob.
  • Jonathan Leffler
    Jonathan Leffler almost 11 years
    No, it doesn't. Both the awk and the sed scripts — at least on my machine with my copy of the data file you provided — print 5 blocks of data between START=A and END, and the blocks with START=B to END, START=C to END and START=D to END are all omitted from the output. Which platform are you testing on? Which version of sed are you using? Which version of awk are you using? (I note that your test data repeats verbatim the blocks between START=A and END. It would be much better if you had different lines in between so you could see which lines are being printed.)
  • Jonathan Leffler
    Jonathan Leffler almost 11 years
    Good point about the space in the sed regex, though it makes the quoted output even more puzzling (as in 'I cannot reproduce the quoted output with the original script, but drop the extraneous space and it works fine, albeit cackhanded'). You can at least simplify the last part of your awk script to /END/{flag=0} which might set flag to zero when it was already zero, but that does no harm. You can also use /START=A/,/END/{print} which is much simpler.
  • abasu
    abasu almost 11 years
    yea, /START=A/,/END/{print} this is much simpler, but it's already shown in your answer :) I was just playing around with a flag :). Actually, after the awk solution you have given, he does not need to do anything else. I'll remove my awk solution. It might lead to more confusion than doing any good :P
  • Vikrant Singh
    Vikrant Singh almost 8 years
    can you explain what , means in sed pattern string?
  • starfry
    starfry over 7 years
    @Vikrant - the , separates two parts of a range defined by two regexes so that the lines between the first pattern and the second pattern are returned.
  • Lennart Rolland
    Lennart Rolland over 7 years
    When I test this, the start and end toeks are included in the output, while I had the impression the OP wanted only data BETWEEN them.
  • Jonathan Leffler
    Jonathan Leffler over 7 years
    @LennartRolland: The sample desired output specifically includes the START=A and END lines. If you don't want the start and end markers to appear, you can use sed like this: sed -n -e '/^START=A$/,/^END$/ { /^START=A$/d; /^END$/d; p; }'. Or, you can use awk like this: awk '/^START=A$/,/^END$/ { if ($0 != "START=A" && $0 != "END") print }' (same basic idea, though you can code the condition in a number of different ways if desired)