Simplest way to do basic xml parsing from unix command line

23,131

Solution 1

The following linux command uses XPath to access specified values within the XML file

for xml in `find . -name "*.xml"`
do  
echo $xml `xmllint --xpath "/param-value/value/text()" $xml`| awk 'NF>1'
done

Example output for matching XML files:

./test1.xml asdf
./test4.xml 1234

Solution 2

$ xmlstarlet ed -u /param-value/name -v Roles -u /param-value/value -v asdf data.xml

<?xml version="1.0"?>
<param-value>
  <name>Roles</name>
  <description>some description</description>
  <value>asdf</value>
</param-value>

Solution 3

I worked out a couple of solutions using basic perl/awk functionality (basically a poor man's parsing of the tags). If you see any improvements using only basic perl/awk functionality, let me know. I avoided dealing with multiline regular expressions by setting a flag with I see a particular tag. Kind of clumsy but it works.

perl:

perl -ne '$h = 1 if m/Host/; $r = 1 if m/Role/; if ($h && m/<value>/) { $h = 0; print "hosts: ", $_ =~ /<value>(.*)</, "\n"}; if ($r && m/<value>/) { $r = 0; print "\nrole: ", $_ =~ /<value>(.*)</, "\n" }'

awk:

awk '/Host/ {h = 1} /Role/ {r = 1} h && /<value>/ {h = 0; match($0, "<value>(.*)<", a); print "hosts: " a[1]} r && /<value>/ {r = 0; match($0, "<value>(.*)<", a); print "\nrole: " a[1]}'
Share:
23,131
jonderry
Author by

jonderry

Updated on July 18, 2022

Comments

  • jonderry
    jonderry almost 2 years

    I'm searching for xml files that have certain properties. For example, files that contain the following pattern:

    <param-value>
      <name>Hosts</name>
      <description>some description</description>
      <value></value>
    </param-value>
    

    For such files, I'd like to parse the value of another tag, such as:

    <param-value>
      <name>Roles</name>
      <description>some description</description>
      <value>asdf</value>
    </param-value>
    

    And print out the file name along with "asdf". What's the simplest way to accomplish this from the command line?

    One approach I was thinking of was just using grep with the -l option to filter the matching files out, and then using xargs grep to extract the value of Roles. However, grep doesn't work well with multi-line regexes. I saw another question that showed it could be done with the -Pzo options, but didn't have any luck getting it to work in my case. Is there a simpler approach?

  • th3penguinwhisperer
    th3penguinwhisperer over 5 years
    Didn't knew xmllint could be used to parse xml. To me this is the best answer because it's always installed as it's a system dependency (at least on CentOS/Redhat/...)