Scripting: what is the easiest to extract a value in a tag of a XML file?

66,378

Solution 1

xml2 can convert xml to/from line-oriented format:

xml2 < pom.xml  | grep /project/version= | sed 's/.*=//'

Solution 2

Other way: xmlgrep and XPath:

xmlgrep --text_only '/project/version' pom.xml

Disadvantage: slow

Solution 3

Using python

$ python -c 'from xml.etree.ElementTree import ElementTree; print ElementTree(file="pom.xml").findtext("{http://maven.apache.org/POM/4.0.0}version")'
1.0.74-SNAPSHOT

Using xmlstarlet

$ xml sel -N x="http://maven.apache.org/POM/4.0.0" -t -m 'x:project/x:version' -v . pom.xml
1.0.74-SNAPSHOT

Using xmllint

$ echo -e 'setns x=http://maven.apache.org/POM/4.0.0\ncat /x:project/x:version/text()' | xmllint --shell pom.xml | grep -v /
1.0.74-SNAPSHOT

Solution 4

Clojure way. Requires only jvm with special jar file:

java -cp clojure.jar clojure.main -e "(use 'clojure.xml) (->> (java.io.File. \"pom.xml\") (clojure.xml/parse) (:content) (filter #(= (:tag %) :version)) (first) (:content) (first) (println))"

Scala way:

java -Xbootclasspath/a:scala-library.jar -cp scala-compiler.jar scala.tools.nsc.MainGenericRunner -e 'import scala.xml._; println((XML.load(new java.io.FileInputStream("pom.xml")) match { case <project>{children @ _*}</project> => for (i <- children if (i  match { case <version>{children @ _*}</version> => true; case _ => false;  }))  yield i })(0) match { case <version>{Text(x)}</version> => x })'

Groovy way:

java -classpath groovy-all.jar groovy.ui.GroovyMain -e 'println (new XmlParser().parse(new File("pom.xml")).value().findAll({ it.name().getLocalPart()=="version" }).first().value().first())'

Solution 5

Here's an alternative in Perl

$ perl -MXML::Simple -e'print XMLin("pom.xml")->{version}."\n"'
1.0.74-SNAPSHOT

It works with the revised/extended example in the questions which has multiple "version" elements at different depths.

Share:
66,378

Related videos on Youtube

Anthony Kong
Author by

Anthony Kong

Everything is fine.

Updated on September 18, 2022

Comments

  • Anthony Kong
    Anthony Kong over 1 year

    I want to read a pom.xml ('Project Object Model' of Maven) and extract the version information. Here is an example:

    <?xml version="1.0" encoding="UTF-8"?><project 
    xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    
        <modelVersion>4.0.0</modelVersion>
        <groupId>com.mycompany</groupId>
        <artifactId>project-parent</artifactId>
        <name>project-parent</name>
        <version>1.0.74-SNAPSHOT</version>
        <dependencies>
            <dependency>
            <groupId>com.sybase.jconnect</groupId>
            <artifactId>jconnect</artifactId>
            <version>6.05-26023</version>
        </dependency>
        <dependency>
            <groupId>joda-time</groupId>
            <artifactId>joda-time</artifactId>
            <version>1.5.2</version>
        </dependency>
        <dependency>
            <groupId>com.sun.jdmk</groupId>
            <artifactId>jmxtools</artifactId>
            <version>1.2.1</version>
        </dependency>
        <dependency>
            <groupId>org.easymock</groupId>
            <artifactId>easymock</artifactId>
            <version>2.4</version>
        </dependency>       
    </dependencies>
    </project>
    

    How can I extract the version '1.0.74-SNAPSHOT' from above?

    Would love to be able to do so using simple bash scripting sed or awk. Otherwise a simple python is preferred.

    EDIT

    1. Constraint

      The linux box is in a corporate environment so I can only use tools that are already installed (not that I cannot request utility such as xml2, but I have to go through a lot of red-tape). Some of the solutions are very good (learn a few new tricks already), but they may not be applicable due to the restricted environment

    2. updated xml listing

      I added the dependencies tag to the original listing. This will show some hacky solution may not work in this case

    3. Distro

      The distro I am using is RHEL4

    • bbaja42
      bbaja42 over 12 years
    • Anthony Kong
      Anthony Kong over 12 years
      Not really. There are a lot of version tag in the xml (e.g. under dependencies tag). I only want '/project/version'
    • Vi.
      Vi. over 12 years
      Which xml-related tools and libraries are available? Are jvm-based soltuions OK?
    • Anthony Kong
      Anthony Kong over 12 years
      So far I can tell xml2, xmlgrep and perl XML module are not present. Most unix command-line utilities are present. The distro is Redhat EL 4.
    • JStrahl
      JStrahl over 11 years
      (I couldn't add a comment so I have to reply as an answer, overkill somewhat) Some great answers can be found here..... stackoverflow.com/questions/2735548/…
    • Ciro Santilli Путлер Капут 六四事
      Ciro Santilli Путлер Капут 六四事 over 8 years
  • Vi.
    Vi. over 12 years
    Slow, (although faster than xmlgrep)
  • Anthony Kong
    Anthony Kong over 12 years
    Thanks for the suggestion, but unfortunately it will not return what I want. Please see the updated pom model.
  • Vi.
    Vi. over 12 years
    Returns "1.0.74-SNAPSHOT". Note that I changed the script after reading about multiple <version> things.
  • Vi.
    Vi. over 12 years
    Note: this solution is provided "just for fun" and is not intended to be used in actual product. Better use xml2/xmlgrep/XML::Simple solution.
  • Anthony Kong
    Anthony Kong over 12 years
    Thanks! even though it is 'just for fun' but it is probably the 'most suitable' solution by far because it has minimum number of dependencies: It only requires perl ;-)
  • Vi.
    Vi. over 12 years
    What about doing it from Java? Using pom files implies having JVM installed.
  • Anthony Kong
    Anthony Kong over 12 years
    The background is that I am building a SIT (system integration test) script around the existing maven process. Part of it requires knowing the version of the maven project. I really want to keep it simple and scripting is the way to go.
  • Anthony Kong
    Anthony Kong over 12 years
    This is awesome! Great idea!
  • David H
    David H over 12 years
    If xsltproc is on your system, and it probably is as libxslt is on RHEL4, then you can use it and the above stylesheet to output the tag, i.e. xsltproc x.xsl prom.xsl.
  • kev
    kev over 12 years
    cat (//x:version)[1]/text() when using xmllint also works!
  • Vi.
    Vi. over 12 years
    Relies on absence of parameters in elements and that extra <version>s can be only inside dependencies.
  • Simon Sheehan
    Simon Sheehan over 12 years
    What exactly does this script do?
  • Samus_
    Samus_ over 12 years
    it loads the XML as a DOM structure using Python's minidom implementation: docs.python.org/library/xml.dom.minidom.html the idea is to grab the <project> tag that is unique and then iterate over its child nodes (direct childs only) to find the tag <version> that we're looking for and not other tags with the same name in other places.
  • fixer1234
    fixer1234 over 8 years
    Can you expand your answer to explain this? Thanks.
  • GAD3R
    GAD3R about 5 years
    command updated to xml_grep
  • Charlweed
    Charlweed almost 5 years
    Powershell is now open source and runs on Linux and other platforms. We use it for building in preference to bash, cygwin and ming64.
  • SMerrill8
    SMerrill8 about 4 years
    This does appear to work, but beware: What it does is set the field separator (FS) to the set of characters < and >; then it finds all lines with the word "packaging" in them and give you the third field.
  • user5249203
    user5249203 over 3 years
    xml2 can be found at github.com/clone/xml2 - it's original website etc have disappeared.