Replace a string containing newline characters
Solution 1
Three different sed
commands:
sed '$!N;s/"[^"]*"\n<[^>]*>/other characters /;P;D'
sed -e :n -e '$!N;s/"[^"]*"\n<[^>]*>/other characters /;tn'
sed -e :n -e '$!N;/"$/{$!bn' -e '};s/"[^"]*"\n<[^>]*>/other characters /g'
They all three build on the basic s///
ubstitution command:
s/"[^"]*"\n<[^>]*>/other characters /
They also all try to take care in their handling of the last line, as sed
s tend to differ on their output in edge cases. This is the meaning of $!
which is an address matching every line that is !
not the $
last.
They also all use the N
ext command to append the next input line to pattern space following a \n
ewline character. Anyone who has been sed
ing for a while will have learned to rely on the \n
ewline character - because the only way to get one is to explicitly put it there.
All three make some attempt to read in as little input as possible before taking action - sed
acts as soon as it might and needn't read in an entire input file before doing so.
Though they do all N
, they all three differ in their methods of recursion.
First Command
The first command employs a very simple N;P;D
loop. These three commands are built-in to any POSIX-compatible sed
and they complement one another nicely.
-
N
- as already mentioned, appends theN
ext input line to pattern-space following an inserted\n
ewline delimiter. -
P
- likep
; itP
rints pattern-space - but only up-to the first occurring\n
ewline character. And so, given the following input/command:printf %s\\n one two | sed '$!N;P;d'
sed
P
rints only one. However, with...-
D
- liked
; itD
eletes pattern-space and begins another line-cycle. Unliked
,D
deletes only up to the first occurring\n
ewline in pattern-space. If there is more in pattern-space following\n
ewline character,sed
begins the next line cycle with what remains. If thed
in the previous example were replaced with aD
, for example,sed
wouldP
rint both one and two.
This command recurses only for lines which do not match the s///
ubstitution statement. Because the s///
ubstitution removes the \n
ewline added with N
, there is never anything remaining when sed
D
eletes pattern-space.
Tests could be done to apply the P
and/or D
selectively, but there are other commands which fit better with that strategy. Because the recursion is implemented to handle consecutive lines which match only part of the replacement rule, consecutive sequences of lines matching both ends of the s///
ubstitution do not work well.:
Given this input:
first "line"
<second>"line"
<second>"line"
<second>line and so on
...it prints...
first other characters "line"
<second>other characters line and so on
It does, however, handle
first "line"
second "line"
<second>line
...just fine.
Second Command
This command is very similar to the third. Both employ a :b
ranch/t
est label (as is also demonstrated in Joeseph R.'s answer here) and recurse back to it given certain conditions.
-
-e :n -e
- portablesed
scripts will delimit a:
label definition with either a\n
ewline or a new inline-e
xecution statement.-
:n
- defines a label namedn
. This can be returned to at any time with eitherbn
ortn
.
-
-
tn
- thet
est command returns to a specified label (or, if none is provided, quits the script for the current line-cycle) if anys///
ubstitution since either the label was defined or since it was last calledt
ests successful.
In this command the recursion occurs for the matching lines. If sed
successfully replaces the pattern with other characters, sed
returns to the :n
label and tries again. If a s///
ubstitution is not performed sed
autoprints pattern-space and begins the next line-cycle.
This tends to handle consecutive sequences better. Where the last one failed, this prints:
first other characters other characters other characters line and so on
Third Command
As mentioned, the logic here is very similar to the last, but the test is more explicit.
-
/"$/bn
- this issed
's test. Because theb
ranch command is a function of this address,sed
will onlyb
ranch back to:n
after a\n
ewline is appended and pattern-space still ends with a"
double-quote.
There is as little done between N
and b
as possible - in this way sed
can very quickly gather exactly as much input as necessary to ensure that the following line cannot match your rule. The s///
ubstitution differs here in that it employs the g
lobal flag - and so it will do all necessary replacements at once. Given identical input this command outputs identically to the last.
Solution 2
Well, I can think of a couple of simple ways but neither involves grep
(which doesn't do substitutions anyway) or sed
.
-
Perl
To replace each occurrence of
"line"\n<second>
withother characters
, use:$ perl -00pe 's/"line"\n<second>/other characters /g' file first other characters line and so on
Or, to treat multiple, consecutive occurrences of
"line"\n<second>
as one, and replace all of them with a singleother characters
, use:perl -00pe 's/(?:"line"\n<second>)+/other characters /g' file
Example:
$ cat file first "line" <second>"line" <second>"line" <second>line and so on $ perl -00pe 's/(?:"line"\n<second>)+/other characters /g' file first other characters line and so on
The
-00
causes Perl to read the file in "paragraph mode" which means that "lines" are defined by\n\n
instead of\n
, essentially, each paragraph is treated as a line. The substitution therefore matches across a newline. -
awk
$ awk -v RS="\n\n" -v ORS="" '{ sub(/"line"\n<second>/,"other characters ", $0) print; }' file first other characters line and so on
The same basic idea, we set the record separator (
RS
) to\n\n
to slurp the whole file, then the output record separator to nothing (otherwise an extra newline is printed) and then use thesub()
function to make the replacement.
Solution 3
read the whole file and do a global replacement:
sed -n 'H; ${x; s/"line"\n<second>/other characters /g; p}' <<END
first "line"
<second> line followed by "line"
<second> and last
END
first other characters line followed by other characters and last
Solution 4
Here's a variant on glenn's answer that will work if you have multiple consecutive occurrences (works with GNU sed
only):
sed ':x /"line"/N;s/"line"\n<second>/other characters/;/"line"/bx' your_file
The :x
is just a label for branching. Basically, what this does, is that it checks the line after substitution and if it still matches "line"
, it branches back to the :x
label (that's what bx
does) and adds another line to the buffer and starts processing it.
Related videos on Youtube
BowPark
Updated on September 18, 2022Comments
-
BowPark over 1 year
With the
bash
shell, in a file with rows like the following onesfirst "line" <second>line and so on
I would like to replace one or more occurrences of
"line"\n<second>
withother characters
and obtain each time:first other characters line and so on
So I have to replace a string both with special characters such as
"
and<
and with a newline character.After searching between the other answers, I found that
sed
can accept newlines in the right-hand side of the command (so, theother characters
string), but not in the left.Is there a way (simpler than this) to obtain this result with
sed
orgrep
?-
mikeserv over 9 yearsare you working w/ a mac? the
\n
ewline statement you make is why i ask. people seldom ask if they can dos//\n/
as you can with GNUsed
, though most othersed
s will reject that escape on the right hand side. still, the\n
escape will work on the left in any POSIXsed
and you can portably translate them likey/c/\n/
though it will have the same effect ass/c/\n/g
and so isnt always as useful.
-
-
BowPark over 9 yearsYes. It works, but what if I have multiple occurrences?
-
BowPark over 9 yearsSorry for the trivial question, but what is the meaning of
DATA
and how do you receive the text input? -
mikeserv over 9 years@BowPark - In this example
<<\DATA\ntext input\nDATA\n
is baked in, but that is only text handed tosed
by the shell in a here document. It would work as well likesed 'script' filename
orprocess that writes to stdout | sed 'script'
. Does that help? -
Jeff Hewitt over 9 years@mikeserv Please be specific about what you mean. It worked for me.
-
Jeff Hewitt over 9 years@mikeserv I'm sorry, I really don't know what you're talking about. I copied the above code line back into my terminal and it worked correctly.
-
mikeserv over 9 yearsretracted - this does apparently work in GNU
sed
which takes its non-POSIX label handling far enough to accept a space as a delimiter for label declaration. You should note though, that any othersed
will fail there - and will fail forN
. GNUsed
breaks POSIX guidelines to print pattern-space before quitting on aN
on the last line, but POSIX makes it clear that if anN
command is read on the last line nothing should be printed. -
mikeserv over 9 yearsIf you edit the post to specify GNU I will reverse my vote and delete these comments. Also, it might be worth learning about GNU's
v
command which breaks in every othersed
but is a no-op in GNU versions 4 and greater. -
Angel Todorov over 9 yearsHuh, right. Fixed
-
Jeff Hewitt over 9 years@mikeserv Thanks for the comments. I edited the post. Please don't delete your comments as they may benefit someone else.
-
mikeserv over 9 yearsin that case I will offer one more - this can be done portably like:
sed -e :x -e '/"line"/{$!N' -e '};s/"line"\n<second>/other characters/;/"line"/bx'
. -
mikeserv over 9 yearssorry to nitpick again, but
${cmds}
is GNU-specific - most othersed
s will require a\n
ewline or an-e
break betweenp
and}
. You can avoid the brackets altogether - and portably - and even avoid inserting an extra\n
ewline character on the first line like:sed 'H;1h;$!d;x;s/"line"\n<second>/other characters /g'
-
terdon over 9 years@mikeserv? Which one? The second is supposed to, the OP said they want "to replace one or more occurrences of", so eating the paragraph might well be what they expect.
-
mikeserv over 9 yearsvery good point. I guess I focused more on and obtain each time, but I guess it is not clear if that should be one replacement per occurrence or one replacement per sequence of occurrences... @BowPark?
-
BowPark over 9 yearsIt is needed one replacement per occurrence.
-
BowPark over 9 yearsYes it does, thank you! Why without
D
every modified line is double? (You used it as it is necessary; maybe I don't knowsed
very well) -
terdon over 9 years@BowPark OK, then the first perl approach or the awk should both work. Don't they give you the desired output?
-
BowPark over 9 yearsI tested it and it seems not portable. It prints an extra new-line at the beginning of the output, but the result is correct on GNU.
-
BowPark over 9 yearsIt works, thank you, but the third line with
awk
should beprint;}' file
. I need to avoid Perl and to preferably usesed
, anyway you suggested good alternatives. -
mikeserv over 9 years@BowPark - you get doubles when omitting the
D
becauseD
otherwiseD
eletes from output what you now see doubled. I have just made an edit - and I may expand on that as well soon. -
mikeserv over 9 years@BowPark - ok, I've updated it and provided options. It might be a little easier to read/understand now. I also explicitly addressed the
D
thing.