Sed to extract text between two strings
Solution 1
sed -n '/^START=A$/,/^END$/p' data
The -n
option means don't print by default; then the script says 'do print between the line containing START=A
and the next END
.
You can also do it with awk
:
A pattern may consist of two patterns separated by a comma; in this case, the action is performed for all lines from an occurrence of the first pattern though an occurrence of the second.
(from man awk
on Mac OS X).
awk '/^START=A$/,/^END$/ { print }' data
Given a modified form of the data file in the question:
START=A
xxx01
xxx02
END
START=A
xxx03
xxx04
END
START=A
xxx05
xxx06
END
START=B
xxx07
xxx08
END
START=A
xxx09
xxx10
END
START=C
xxx11
xxx12
END
START=A
xxx13
xxx14
END
START=D
xxx15
xxx16
END
The output using GNU sed
or Mac OS X (BSD) sed
, and using GNU awk
or BSD awk
, is the same:
START=A
xxx01
xxx02
END
START=A
xxx03
xxx04
END
START=A
xxx05
xxx06
END
START=A
xxx09
xxx10
END
START=A
xxx13
xxx14
END
Note how I modified the data file so it is easier to see where the various blocks of data printed came from in the file.
If you have a different output requirement (such as 'only the first block between START=A and END', or 'only the last ...'), then you need to articulate that more clearly in the question.
Solution 2
Basic version ...
sed -n '/START=A/,/END/p' yourfile
More robust version...
sed -n '/^ *START=A *$/,/^ *END *$/p' yourfile
Solution 3
Your sed
expression has a space before end, i.e / ^END/
. So sed
gets the starting pattern, but does not get the ending pattern and keeps on printing till end. Use sed '/^START=A/, /^END/!d' input_file
(notice /^END/
)
Related videos on Youtube
Comments
-
ranganath111 almost 2 years
Please help me in using sed. I have a file like below.
START=A xxxxx xxxxx END START=A xxxxx xxxxx END START=A xxxxx xxxxx END START=B xxxxx xxxxx END START=A xxxxx xxxxx END START=C xxxxx xxxxx END START=A xxxxx xxxxx END START=D xxxxx xxxxx END
I want to get the text between START=A, END. I used the below query.
sed '/^START=A/, / ^END/!d' input_file
The problem here is , I am getting
START=A xxxxx xxxxx END START=D xxxxx xxxxx END
instead of
START=A xxxxx xxxxx END
Sed finds greedily.
Please help me in resolvng this.
Thanks in advance.
Can I use AWK for achieving above?
-
ranganath111 almost 11 yearsThanks for the reply. I need text between START=A and the next END, the above one gives data between START=A and last END. Hope you got my prob.
-
Jonathan Leffler almost 11 yearsNo, it doesn't. Both the
awk
and thesed
scripts — at least on my machine with my copy of the data file you provided — print 5 blocks of data betweenSTART=A
andEND
, and the blocks withSTART=B
toEND
,START=C
toEND
andSTART=D
toEND
are all omitted from the output. Which platform are you testing on? Which version ofsed
are you using? Which version ofawk
are you using? (I note that your test data repeats verbatim the blocks betweenSTART=A
andEND
. It would be much better if you had different lines in between so you could see which lines are being printed.) -
Jonathan Leffler almost 11 yearsGood point about the space in the
sed
regex, though it makes the quoted output even more puzzling (as in 'I cannot reproduce the quoted output with the original script, but drop the extraneous space and it works fine, albeit cackhanded'). You can at least simplify the last part of yourawk
script to/END/{flag=0}
which might set flag to zero when it was already zero, but that does no harm. You can also use/START=A/,/END/{print}
which is much simpler. -
abasu almost 11 yearsyea,
/START=A/,/END/{print}
this is much simpler, but it's already shown in your answer :) I was just playing around with a flag :). Actually, after theawk
solution you have given, he does not need to do anything else. I'll remove myawk
solution. It might lead to more confusion than doing any good :P -
Vikrant Singh almost 8 yearscan you explain what
,
means in sed pattern string? -
starfry over 7 years@Vikrant - the
,
separates two parts of a range defined by two regexes so that the lines between the first pattern and the second pattern are returned. -
Lennart Rolland over 7 yearsWhen I test this, the start and end toeks are included in the output, while I had the impression the OP wanted only data BETWEEN them.
-
Jonathan Leffler over 7 years@LennartRolland: The sample desired output specifically includes the
START=A
andEND
lines. If you don't want the start and end markers to appear, you can usesed
like this:sed -n -e '/^START=A$/,/^END$/ { /^START=A$/d; /^END$/d; p; }'
. Or, you can useawk
like this:awk '/^START=A$/,/^END$/ { if ($0 != "START=A" && $0 != "END") print }'
(same basic idea, though you can code the condition in a number of different ways if desired)