Search replace in XML file with sed or awk

21,950

Solution 1

I think that there are a couple of problems in your sed command:

  • You don't use the -n option, so by default sed just prints every line of input to the output (possibly modified by a sed command).

  • You don't need the redirection < c3.xml, because sed recognizes the last argument as a filename.

  • sed is not very well suited for matches over multiple lines. See for example here.

The following seems to work on your example:

sed -n "/<fmreq:name>object_name<\/fmreq:name>/ {n;p}" c3.xml | sed "s/^\s*<fmreq:value>\(.*\)<\/fmreq:value>/\1/g"

Or, with only one sed invocation:

sed -n "/<fmreq:name>object_name<\/fmreq\:name>/ {n;s/^\s*<fmreq:value>\(.*\)<\/fmreq:value>/\1/g;p}" c3.xml

Breakdown of what this command does:

  • The option -n tells sed not to print the pattern space after it's finished processing the line. Consequently, you need to use the command p explicitely to do so.

  • /regex/ tells sed to execute the commands that follow only on the lines that match regex.

  • The sed command n replaces the content of the pattern space by the next line of input, which is the one containing the value you are interested in.

  • The sed command s/regex/replacement/ substitutes the first match of regex in the pattern space by replacement.

  • The sed command p prints the line.

Solution 2

Using XMLStarlet:

$ xml ed -u '//fmreq:property[fmreq:name="object_name"]/preceding-sibling::fmreq:property/fmreq:name' -v YYZ file.xml
<?xml version="1.0"?>
<fmreq:fileManagementRequestDetail xmlns:fmreq="http://foobar.com/filemanagement">
  <fmreq:property>
    <fmreq:name>YYC</fmreq:name>
    <fmreq:value>Memos</fmreq:value>
  </fmreq:property>
  <fmreq:property>
    <fmreq:name>object_name</fmreq:name>
    <fmreq:value>Correspondence</fmreq:value>
  </fmreq:property>
</fmreq:fileManagementRequestDetail>

The first part of the XPath, //fmreq:property[fmreq:name="object_name"] will locate the <fmreq:name>object_name</fmreq:name> node, and the /preceding-sibling::fmreq:property/fmreq:name bit will locate the <fmreq:name> node of the preceding <fmreq:property> node.

Share:
21,950

Related videos on Youtube

Bob Lyman
Author by

Bob Lyman

Updated on September 18, 2022

Comments

  • Bob Lyman
    Bob Lyman over 1 year

    So I have a task where by I have to manipulate an XML file through a bash shell script.

    Here are the steps:

    1. Query XML file for a value.
    2. Take the value and cross reference it to find a new value from a list.
    3. Replace the value of a different element with the new value.

    Here is a sample of the XML with non-essential info removed:

    <fmreq:fileManagementRequestDetail xmlns:fmreq="http://foobar.com/filemanagement">
          <fmreq:property>
             <fmreq:name>form_category_cd</fmreq:name>
             <fmreq:value>Memos</fmreq:value>
          </fmreq:property>
          <fmreq:property>
             <fmreq:name>object_name</fmreq:name>
             <fmreq:value>Correspondence</fmreq:value>
          </fmreq:property>
    </fmreq:fileManagementRequestDetail>
    

    I have to get the value from the value element under object_name, cross reference it, and then replace the value under the form_category_cd value element with the new value:

    So if object_name -> value is Correspondence then the form_category_cd -> value might need to be YYZ.

    Here's the rub, I can only use the tools available on our server as our operations group is restricting us to the tools at hand. It was a fight to get xmllint updated and then it got overruled. I'm on a version that does not support --xpath, which believe me is difficult on a good day. Also the version I have available doesn't support namespaces, so xmllint is out.

    I've tried sed, but it seems to not like my regex even though every tester I try works fine.

    Regex:

    (<fmreq\:name>object_name<\/fmreq\:name>)(?:\n\s*)(<fmreq\:value>)(.*)(<\/fmreq\:value>)
    

    I need to get group #3, but sed won't return it. Instead it returns the entire contents of the XML file.

    sed -e 's/\(<fmreq\:name>object_name<\/fmreq\:name>\)\(?:\n\s*\)\(<fmreq\:value>\)\(.*\)\(<\/fmreq\:value>\)/\3/' < c3.xml 
    

    I'm not as familiar with awk / gawk, so I'm struggling to figure them out and this as well, but am open to them if a solution can be found.

    Would love to have an awk / gawk solution just to make the boss happy since he's an old awk fan, but I'll take what I can get as I'm stumped.

    Again I have to use the tools on hand and can't install anything new.

    • Jeff Schaller
      Jeff Schaller over 6 years
      XML processing is best done with a tool made for the job (bash and sed and awk are not made for the job).
    • RobertL
      RobertL over 6 years
      Where does "YYZ" come from?
    • seshoumara
      seshoumara over 6 years
      What's the format of the list file that holds the new values?
    • Bob Lyman
      Bob Lyman over 6 years
      YYZ is just an example value (also the name of a Rush tune).
    • Bob Lyman
      Bob Lyman over 6 years
      What's actually happening is the value from object_name is eval'd against a set of know values then the value under form_category_cd is replaced with that new value and the file written out. Yes I would love to not use sed and bash, but our operations group will not allow something like python or perl on the server because they don't have resources to support then. Believe this is a battle that was fought and lost by a co-worker.
  • Bob Lyman
    Bob Lyman over 6 years
    The redirection is a leftover from the original command as the operation has to eventually replace another value with a redirect to another file.
  • Rastapopoulos
    Rastapopoulos over 6 years
    Glad it helped. But yes, sed and regular expressions do take some time to get used to. :) All the best!