How to execute XPath one-liners from shell?

149,603

Solution 1

You should try these tools :

  • xmlstarlet : can edit, select, transform... Not installed by default, xpath1
  • xmllint : often installed by default with libxml2-utils, xpath1 (check my wrapper to have --xpath switch on very old releases and newlines delimited output (v < 2.9.9)
  • xpath : installed via perl's module XML::XPath, xpath1
  • xml_grep : installed via perl's module XML::Twig, xpath1 (limited xpath usage)
  • xidel: xpath3
  • saxon-lint : my own project, wrapper over @Michael Kay's Saxon-HE Java library, xpath3

xmllint comes with libxml2-utils (can be used as interactive shell with the --shell switch)

xmlstarlet is xmlstarlet.

xpath comes with perl's module XML::Xpath

xml_grep comes with perl's module XML::Twig

xidel is xidel

saxon-lint using SaxonHE 9.6 ,XPath 3.x (+retro compatibility)

Ex :

xmllint --xpath '//element/@attribute' file.xml
xmlstarlet sel -t -v "//element/@attribute" file.xml
xpath -q -e '//element/@attribute' file.xml
xidel -se '//element/@attribute' file.xml
saxon-lint --xpath '//element/@attribute' file.xml

.

Solution 2

You can also try my Xidel. It is not in a package in the repository, but you can just download it from the webpage (it has no dependencies).

It has simple syntax for this task:

xidel filename.xml -e '//element/@attribute' 

And it is one of the rare of these tools that supports XPath 2.

Solution 3

One package that is very likely to be installed on a system already is python-lxml. If so, this is possible without installing any extra package:

python -c "from lxml.etree import parse; from sys import stdin; print('\n'.join(parse(stdin).xpath('//element/@attribute')))"

Solution 4

Saxon will do this not only for XPath 2.0, but also for XQuery 1.0 and (in the commercial version) 3.0. It doesn't come as a Linux package, but as a jar file. Syntax (which you can easily wrap in a simple script) is

java net.sf.saxon.Query -s:source.xml -qs://element/attribute

2020 UPDATE

Saxon 10.0 includes the Gizmo tool, which can be used interactively or in batch from the command line. For example

java net.sf.saxon.Gizmo -s:source.xml
/>show //element/@attribute
/>quit

Solution 5

In my search to query maven pom.xml files I ran accross this question. However I had the following limitations:

  • must run cross-platform.
  • must exist on all major linux distributions without any additional module installation
  • must handle complex xml-files such as maven pom.xml files
  • simple syntax

I have tried many of the above without success:

  • python lxml.etree is not part of the standard python distribution
  • xml.etree is but does not handle complex maven pom.xml files well, have not digged deep enough
  • python xml.etree does not handle maven pom.xml files for unknown reason
  • xmllint does not work either, core dumps often on ubuntu 12.04 "xmllint: using libxml version 20708"

The solution that I have come across that is stable, short and work on many platforms and that is mature is the rexml lib builtin in ruby:

ruby -r rexml/document -e 'include REXML; 
     puts XPath.first(Document.new($stdin), "/project/version/text()")' < pom.xml

What inspired me to find this one was the following articles:

Share:
149,603

Related videos on Youtube

clacke
Author by

clacke

Been working with LotusScript in Notes/Domino for over a decade, worked with WebSphere and related Java technologies for two years. After four years of web development and development support in Hong Kong I am now doing Continuous Integration work in Sweden.

Updated on March 17, 2022

Comments

  • clacke
    clacke over 2 years

    Is there a package out there, for Ubuntu and/or CentOS, that has a command-line tool that can execute an XPath one-liner like foo //element@attribute filename.xml or foo //element@attribute < filename.xml and return the results line by line?

    I'm looking for something that would allow me to just apt-get install foo or yum install foo and then just works out-of-the-box, no wrappers or other adaptation necessary.

    Here are some examples of things that come close:

    Nokogiri. If I write this wrapper I could call the wrapper in the way described above:

    #!/usr/bin/ruby
    
    require 'nokogiri'
    
    Nokogiri::XML(STDIN).xpath(ARGV[0]).each do |row|
      puts row
    end
    

    XML::XPath. Would work with this wrapper:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use XML::XPath;
    
    my $root = XML::XPath->new(ioref => 'STDIN');
    for my $node ($root->find($ARGV[0])->get_nodelist) {
      print($node->getData, "\n");
    }
    

    xpath from XML::XPath returns too much noise, -- NODE -- and attribute = "value".

    xml_grep from XML::Twig cannot handle expressions that do not return elements, so cannot be used to extract attribute values without further processing.

    EDIT:

    echo cat //element/@attribute | xmllint --shell filename.xml returns noise similar to xpath.

    xmllint --xpath //element/@attribute filename.xml returns attribute = "value".

    xmllint --xpath 'string(//element/@attribute)' filename.xml returns what I want, but only for the first match.

    For another solution almost satisfying the question, here is an XSLT that can be used to evaluate arbitrary XPath expressions (requires dyn:evaluate support in the XSLT processor):

    <?xml version="1.0"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
        xmlns:dyn="http://exslt.org/dynamic" extension-element-prefixes="dyn">
      <xsl:output omit-xml-declaration="yes" indent="no" method="text"/>
      <xsl:template match="/">
        <xsl:for-each select="dyn:evaluate($pattern)">
          <xsl:value-of select="dyn:evaluate($value)"/>
          <xsl:value-of select="'&#10;'"/>
        </xsl:for-each> 
      </xsl:template>
    </xsl:stylesheet>
    

    Run with xsltproc --stringparam pattern //element/@attribute --stringparam value . arbitrary-xpath.xslt filename.xml.

    • Gilles Quenot
      Gilles Quenot over 11 years
      +1 for good question and for the brainstorming about finding a simple and reliable way to print multiple result each on a newline
    • miken32
      miken32 about 7 years
      Note that the "noise" from xpath is on STDERR and not STDOUT.
    • clacke
      clacke about 7 years
      @miken32 No. I wanted only the value for output. hastebin.com/ekarexumeg.bash
  • clacke
    clacke over 11 years
    Excellent! xmlstarlet sel -T -t -m '//element/@attribute' -v '.' -n filename.xml does exactly what I want!
  • clacke
    clacke over 11 years
    It does not seem to be available as a package, at least not in Ubuntu.
  • choroba
    choroba over 11 years
    @clacke: It is not, but it can be installed from CPAN by cpan XML::XSH2.
  • clacke
    clacke over 11 years
    SaxonB is in Ubuntu, package libsaxonb-java, but if I run saxonb-xquery -qs://element/@attribute -s:filename.xml I get SENR0001: Cannot serialize a free-standing attribute node, same problem as with e.g. xml_grep.
  • clacke
    clacke over 11 years
    Note: xmlstarlet was rumored to be abandoned, but is now under active development again.
  • Michael Kay
    Michael Kay over 11 years
    If you want to see full details of the attribute node selected by this query, use the -wrap option on the command line. If you just want the string value of the attribute, add /string() to the query.
  • clacke
    clacke over 11 years
    Thanks. Adding /string() gets closer. But it outputs an XML header and puts all the results on one row, so still no cigar.
  • Michael Kay
    Michael Kay about 11 years
    If you don't want an XML header, add the option !method=text.
  • cnst
    cnst over 10 years
    @choroba, I've tried that on OS X, but it failed to install, with some kind of makefile error.
  • choroba
    choroba over 10 years
    @cnst: Do you have XML::LibXML installed?
  • cnst
    cnst over 10 years
    @choroba, I don't know; but my point is that, cpan XML::XSH2 fails to install anything.
  • choroba
    choroba over 10 years
    @cnst: Well, it should have also told you why. I was just trying to find the cause.
  • clacke
    clacke over 10 years
    xml_grep2 -t //element@attribute filename.xml works and does what I expect it to (xml_grep --root //element@attribute --text_only filename.xml still doesn't, returns an "unrecognized expression" error). Great!
  • G. Cito
    G. Cito over 10 years
    What about xml_grep --pretty_print --root '//element[@attribute]' --text_only filename.xml? Not sure what is going on there or what XPath says about [] in this case, but surrounding an @attribute with square brackets works for xml_grep and xml_grep2.
  • clacke
    clacke over 10 years
    I mean //element/@attribute, not //element@attribute. Can't edit it apparently, but leaving it there rather than delete+replace to not confuse the history of this discussion.
  • clacke
    clacke over 10 years
    //element[@attribute] selects elements of type element that have an attribute attribute. I do not want the element, only the attribute. <element attribute='foo'/> should give me foo, not the full <element attribute='foo'/>.
  • clacke
    clacke over 10 years
    ... and --text_only in that context gives me the empty string in the case of an element like <element attribute='foo'/> with no text node inside.
  • clacke
    clacke about 10 years
    That's even narrower criteria than the question, so it definitely fits as an answer. I'm sure many people who ran into your situation will be helped by your research. I'm keeping xmlstarlet as the accepted answer, because it fits my wider criteria and it's really neat. But I will probably have use for your solution from time to time.
  • kevinarpe
    kevinarpe about 9 years
    Note: Some older versions of xmllint do not support command line argument --xpath, but most seem to support --shell. Slight dirtier output, but still useful in a bind.
  • tooomg
    tooomg almost 9 years
    I would add that to avoid quotes around the result, use puts instead of p in the Ruby command.
  • clacke
    clacke over 8 years
    Yes, I fell for my own assumption in the question, that XPath implies XML. This answer is a good complement to the others here, and thanks for letting me know about html5lib!
  • FrustratedWithFormsDesigner
    FrustratedWithFormsDesigner almost 8 years
    Xidel looks pretty cool, though you should probably mention that you are the also the author of this tool that you recommend.
  • Ramakrishnan Kannan
    Ramakrishnan Kannan almost 8 years
    How to pass filename?
  • clacke
    clacke almost 8 years
    This works on stdin. That eliminates the need for including open() and close() in an already quite lengthy one-liner. To parse a file just run python -c "from lxml.etree import parse; from sys import stdin; print '\n'.join(parse(stdin).xpath('//element/@attribute'))" < my_file.xml and let your shell handle the file lookup, opening and closing.
  • igo
    igo almost 8 years
    To use namespace add it to -qs like this: '-qs:declare namespace mets="http://www.loc.gov/METS/";/mets:mets/mets:dmdSec'
  • Gilles Quenot
    Gilles Quenot almost 8 years
    Saxon and saxon-lint use xpath3 ;)
  • Pysis
    Pysis over 7 years
    I am still seem to have trouble querying for node contents, not an attribute. Can anyone provide an example for that? For some reason, I still find xmlstarlet difficult to figure out and get right between matching, value, root to just view the document structure, and etc.. Even with the first sel -t -m ... -v ... example from this page: arstechnica.com/information-technology/2005/11/linux-2005111‌​5/2, matching all but the last node and saving that one for the value expression like my use case, I still can't seem to get it, I just get blank output..
  • JJoao
    JJoao over 7 years
    Minor correction "Xml" instead of "xml" : sudo cpan App::Xml_grep2
  • JonnyRaa
    JonnyRaa over 6 years
    nice one on the version of xpath - I'd just run into this limitation of the otherwise excellent xmllint
  • clacke
    clacke over 6 years
    I used this today! Our build servers had neither lxml nor xmllint, or even Ruby. In the spirit of the format in my own answer, I wrote it as python3 -c "from xml.etree.ElementTree import parse; from sys import stdin; print(parse(stdin).find('.//element[subelement=\"value\"]/ot‌​hersubelement').text‌​)" <<< "$variable_containing_xml" in bash. .getroot() doesn't seem necessary.
  • clacke
    clacke over 6 years
    Not a one-liner, and lxml was already mentioned in two other answers years before yours.
  • JGFMK
    JGFMK about 6 years
    Xidel (0..8.win32.zip) shows up as having malware on Virustotal. So try at your own risk virustotal.com/#/file/…
  • clacke
    clacke about 6 years
    xmlstarlet is great, but the accepted and main ranking answer already mentions it. The information about how to handle namespaces might have been relevant as a comment, if at all. Anyone running into issues with namespaces and xmlstarlet can find an excellent discussion in the documentation
  • diemo
    diemo about 6 years
    Sure, @clacke, xmlstarlet has been mentioned several times, but also that it is hard to grasp, and underdocumented. I was guessing around for an hour how to get information out of nested elements. I wish I had had that example, that's why I am posting it here to avoid others that loss of time (and the example is too long for a comment).
  • maoizm
    maoizm over 5 years
    great - I am going to add xidel to my personal wrench tool box
  • toon81
    toon81 over 5 years
    On my Linux Mint machine (a derivative of Ubuntu/Debian), xmllint doesn't come with libxml2 but with libxml2-utils.
  • Ivan Chau
    Ivan Chau about 5 years
    For xidel, download xidel-0.9.8.linux64.tar.gz, enter ./install.sh
  • Hubbitus
    Hubbitus almost 5 years
    Please look also at stackoverflow.com/questions/41114695/… if you wish use xmllint on documents with namespaces
  • clacke
    clacke over 4 years
    And you're hosting on sourcehut! Nice!
  • Thomas W
    Thomas W about 4 years
    You might want to specify the classpath, e.g. like java -classpath /usr/share/java/saxonb.jar net.sf.saxon.Query -s:source.xml -qs://element/@attribute.
  • Vasan
    Vasan almost 4 years
    Nice! I had to run a recursive search for XML files with node(s) matching a given xpath query. Used xidel with find like so: find . -name "*.xml" -printf '%p : ' -exec xidel {} -s -e 'expr' \;
  • Reino
    Reino over 3 years
    @Vasan With a lot of xml-files running xidel for each and every xml-file is very inefficient! With the EXPath File Module xidel can do that much faster: xidel -se 'file:list(.,true(),"*.xml") ! concat(.," : ",doc(.)/{expr})'