Search replace in XML file with sed or awk
Solution 1
I think that there are a couple of problems in your sed
command:
You don't use the
-n
option, so by defaultsed
just prints every line of input to the output (possibly modified by ased
command).You don't need the redirection
< c3.xml
, becausesed
recognizes the last argument as a filename.sed
is not very well suited for matches over multiple lines. See for example here.
The following seems to work on your example:
sed -n "/<fmreq:name>object_name<\/fmreq:name>/ {n;p}" c3.xml | sed "s/^\s*<fmreq:value>\(.*\)<\/fmreq:value>/\1/g"
Or, with only one sed
invocation:
sed -n "/<fmreq:name>object_name<\/fmreq\:name>/ {n;s/^\s*<fmreq:value>\(.*\)<\/fmreq:value>/\1/g;p}" c3.xml
Breakdown of what this command does:
The option
-n
tellssed
not to print the pattern space after it's finished processing the line. Consequently, you need to use the commandp
explicitely to do so./regex/
tellssed
to execute the commands that follow only on the lines that matchregex
.The
sed
commandn
replaces the content of the pattern space by the next line of input, which is the one containing the value you are interested in.The
sed
commands/regex/replacement/
substitutes the first match ofregex
in the pattern space byreplacement
.The
sed
commandp
prints the line.
Solution 2
Using XMLStarlet:
$ xml ed -u '//fmreq:property[fmreq:name="object_name"]/preceding-sibling::fmreq:property/fmreq:name' -v YYZ file.xml
<?xml version="1.0"?>
<fmreq:fileManagementRequestDetail xmlns:fmreq="http://foobar.com/filemanagement">
<fmreq:property>
<fmreq:name>YYC</fmreq:name>
<fmreq:value>Memos</fmreq:value>
</fmreq:property>
<fmreq:property>
<fmreq:name>object_name</fmreq:name>
<fmreq:value>Correspondence</fmreq:value>
</fmreq:property>
</fmreq:fileManagementRequestDetail>
The first part of the XPath, //fmreq:property[fmreq:name="object_name"]
will locate the <fmreq:name>object_name</fmreq:name>
node, and the /preceding-sibling::fmreq:property/fmreq:name
bit will locate the <fmreq:name>
node of the preceding <fmreq:property>
node.
Related videos on Youtube
Bob Lyman
Updated on September 18, 2022Comments
-
Bob Lyman over 1 year
So I have a task where by I have to manipulate an XML file through a bash shell script.
Here are the steps:
- Query XML file for a value.
- Take the value and cross reference it to find a new value from a list.
- Replace the value of a different element with the new value.
Here is a sample of the XML with non-essential info removed:
<fmreq:fileManagementRequestDetail xmlns:fmreq="http://foobar.com/filemanagement"> <fmreq:property> <fmreq:name>form_category_cd</fmreq:name> <fmreq:value>Memos</fmreq:value> </fmreq:property> <fmreq:property> <fmreq:name>object_name</fmreq:name> <fmreq:value>Correspondence</fmreq:value> </fmreq:property> </fmreq:fileManagementRequestDetail>
I have to get the value from the value element under object_name, cross reference it, and then replace the value under the form_category_cd value element with the new value:
So if object_name -> value is Correspondence then the form_category_cd -> value might need to be YYZ.
Here's the rub, I can only use the tools available on our server as our operations group is restricting us to the tools at hand. It was a fight to get xmllint updated and then it got overruled. I'm on a version that does not support --xpath, which believe me is difficult on a good day. Also the version I have available doesn't support namespaces, so xmllint is out.
I've tried sed, but it seems to not like my regex even though every tester I try works fine.
Regex:
(<fmreq\:name>object_name<\/fmreq\:name>)(?:\n\s*)(<fmreq\:value>)(.*)(<\/fmreq\:value>)
I need to get group #3, but sed won't return it. Instead it returns the entire contents of the XML file.
sed -e 's/\(<fmreq\:name>object_name<\/fmreq\:name>\)\(?:\n\s*\)\(<fmreq\:value>\)\(.*\)\(<\/fmreq\:value>\)/\3/' < c3.xml
I'm not as familiar with awk / gawk, so I'm struggling to figure them out and this as well, but am open to them if a solution can be found.
Would love to have an awk / gawk solution just to make the boss happy since he's an old awk fan, but I'll take what I can get as I'm stumped.
Again I have to use the tools on hand and can't install anything new.
-
Jeff Schaller over 6 yearsXML processing is best done with a tool made for the job (bash and sed and awk are not made for the job).
-
RobertL over 6 yearsWhere does "YYZ" come from?
-
seshoumara over 6 yearsWhat's the format of the list file that holds the new values?
-
Bob Lyman over 6 yearsYYZ is just an example value (also the name of a Rush tune).
-
Bob Lyman over 6 yearsWhat's actually happening is the value from object_name is eval'd against a set of know values then the value under form_category_cd is replaced with that new value and the file written out. Yes I would love to not use sed and bash, but our operations group will not allow something like python or perl on the server because they don't have resources to support then. Believe this is a battle that was fought and lost by a co-worker.
-
Bob Lyman over 6 yearsThe redirection is a leftover from the original command as the operation has to eventually replace another value with a redirect to another file.
-
Rastapopoulos over 6 yearsGlad it helped. But yes,
sed
and regular expressions do take some time to get used to. :) All the best!