How to execute XPath one-liners from shell?
Solution 1
You should try these tools :
-
xmlstarlet
: can edit, select, transform... Not installed by default, xpath1 -
xmllint
: often installed by default withlibxml2-utils
, xpath1 (check my wrapper to have--xpath
switch on very old releases and newlines delimited output (v < 2.9.9) -
xpath
: installed via perl's moduleXML::XPath
, xpath1 -
xml_grep
: installed via perl's moduleXML::Twig
, xpath1 (limited xpath usage) -
xidel
: xpath3 -
saxon-lint
: my own project, wrapper over @Michael Kay's Saxon-HE Java library, xpath3
xmllint
comes with libxml2-utils
(can be used as interactive shell with the --shell
switch)
xmlstarlet
is xmlstarlet
.
xpath
comes with perl's module XML::Xpath
xml_grep
comes with perl's module XML::Twig
xidel
is xidel
saxon-lint
using SaxonHE 9.6 ,XPath 3.x (+retro compatibility)
Ex :
xmllint --xpath '//element/@attribute' file.xml
xmlstarlet sel -t -v "//element/@attribute" file.xml
xpath -q -e '//element/@attribute' file.xml
xidel -se '//element/@attribute' file.xml
saxon-lint --xpath '//element/@attribute' file.xml
.
Solution 2
You can also try my Xidel. It is not in a package in the repository, but you can just download it from the webpage (it has no dependencies).
It has simple syntax for this task:
xidel filename.xml -e '//element/@attribute'
And it is one of the rare of these tools that supports XPath 2.
Solution 3
One package that is very likely to be installed on a system already is python-lxml
. If so, this is possible without installing any extra package:
python -c "from lxml.etree import parse; from sys import stdin; print('\n'.join(parse(stdin).xpath('//element/@attribute')))"
Solution 4
Saxon will do this not only for XPath 2.0, but also for XQuery 1.0 and (in the commercial version) 3.0. It doesn't come as a Linux package, but as a jar file. Syntax (which you can easily wrap in a simple script) is
java net.sf.saxon.Query -s:source.xml -qs://element/attribute
2020 UPDATE
Saxon 10.0 includes the Gizmo tool, which can be used interactively or in batch from the command line. For example
java net.sf.saxon.Gizmo -s:source.xml
/>show //element/@attribute
/>quit
Solution 5
In my search to query maven pom.xml files I ran accross this question. However I had the following limitations:
- must run cross-platform.
- must exist on all major linux distributions without any additional module installation
- must handle complex xml-files such as maven pom.xml files
- simple syntax
I have tried many of the above without success:
- python lxml.etree is not part of the standard python distribution
- xml.etree is but does not handle complex maven pom.xml files well, have not digged deep enough
- python xml.etree does not handle maven pom.xml files for unknown reason
- xmllint does not work either, core dumps often on ubuntu 12.04 "xmllint: using libxml version 20708"
The solution that I have come across that is stable, short and work on many platforms and that is mature is the rexml lib builtin in ruby:
ruby -r rexml/document -e 'include REXML;
puts XPath.first(Document.new($stdin), "/project/version/text()")' < pom.xml
What inspired me to find this one was the following articles:
Related videos on Youtube
clacke
Been working with LotusScript in Notes/Domino for over a decade, worked with WebSphere and related Java technologies for two years. After four years of web development and development support in Hong Kong I am now doing Continuous Integration work in Sweden.
Updated on March 17, 2022Comments
-
clacke over 2 years
Is there a package out there, for Ubuntu and/or CentOS, that has a command-line tool that can execute an XPath one-liner like
foo //element@attribute filename.xml
orfoo //element@attribute < filename.xml
and return the results line by line?I'm looking for something that would allow me to just
apt-get install foo
oryum install foo
and then just works out-of-the-box, no wrappers or other adaptation necessary.Here are some examples of things that come close:
Nokogiri. If I write this wrapper I could call the wrapper in the way described above:
#!/usr/bin/ruby require 'nokogiri' Nokogiri::XML(STDIN).xpath(ARGV[0]).each do |row| puts row end
XML::XPath. Would work with this wrapper:
#!/usr/bin/perl use strict; use warnings; use XML::XPath; my $root = XML::XPath->new(ioref => 'STDIN'); for my $node ($root->find($ARGV[0])->get_nodelist) { print($node->getData, "\n"); }
xpath
from XML::XPath returns too much noise,-- NODE --
andattribute = "value"
.xml_grep
from XML::Twig cannot handle expressions that do not return elements, so cannot be used to extract attribute values without further processing.EDIT:
echo cat //element/@attribute | xmllint --shell filename.xml
returns noise similar toxpath
.xmllint --xpath //element/@attribute filename.xml
returnsattribute = "value"
.xmllint --xpath 'string(//element/@attribute)' filename.xml
returns what I want, but only for the first match.For another solution almost satisfying the question, here is an XSLT that can be used to evaluate arbitrary XPath expressions (requires dyn:evaluate support in the XSLT processor):
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:dyn="http://exslt.org/dynamic" extension-element-prefixes="dyn"> <xsl:output omit-xml-declaration="yes" indent="no" method="text"/> <xsl:template match="/"> <xsl:for-each select="dyn:evaluate($pattern)"> <xsl:value-of select="dyn:evaluate($value)"/> <xsl:value-of select="' '"/> </xsl:for-each> </xsl:template> </xsl:stylesheet>
Run with
xsltproc --stringparam pattern //element/@attribute --stringparam value . arbitrary-xpath.xslt filename.xml
.-
Gilles Quenot over 11 years+1 for good question and for the brainstorming about finding a simple and reliable way to print multiple result each on a newline
-
miken32 about 7 yearsNote that the "noise" from
xpath
is on STDERR and not STDOUT. -
clacke about 7 years@miken32 No. I wanted only the value for output. hastebin.com/ekarexumeg.bash
-
-
clacke over 11 yearsExcellent!
xmlstarlet sel -T -t -m '//element/@attribute' -v '.' -n filename.xml
does exactly what I want! -
clacke over 11 yearsIt does not seem to be available as a package, at least not in Ubuntu.
-
choroba over 11 years@clacke: It is not, but it can be installed from CPAN by
cpan XML::XSH2
. -
clacke over 11 yearsSaxonB is in Ubuntu, package
libsaxonb-java
, but if I runsaxonb-xquery -qs://element/@attribute -s:filename.xml
I getSENR0001: Cannot serialize a free-standing attribute node
, same problem as with e.g.xml_grep
. -
clacke over 11 yearsNote: xmlstarlet was rumored to be abandoned, but is now under active development again.
-
Michael Kay over 11 yearsIf you want to see full details of the attribute node selected by this query, use the -wrap option on the command line. If you just want the string value of the attribute, add /string() to the query.
-
clacke over 11 yearsThanks. Adding /string() gets closer. But it outputs an XML header and puts all the results on one row, so still no cigar.
-
Michael Kay about 11 yearsIf you don't want an XML header, add the option !method=text.
-
cnst over 10 years@choroba, I've tried that on OS X, but it failed to install, with some kind of makefile error.
-
choroba over 10 years@cnst: Do you have XML::LibXML installed?
-
cnst over 10 years@choroba, I don't know; but my point is that,
cpan XML::XSH2
fails to install anything. -
choroba over 10 years@cnst: Well, it should have also told you why. I was just trying to find the cause.
-
clacke over 10 years
xml_grep2 -t //element@attribute filename.xml
works and does what I expect it to (xml_grep --root //element@attribute --text_only filename.xml
still doesn't, returns an "unrecognized expression" error). Great! -
G. Cito over 10 yearsWhat about
xml_grep --pretty_print --root '//element[@attribute]' --text_only filename.xml
? Not sure what is going on there or what XPath says about[]
in this case, but surrounding an@attribute
with square brackets works forxml_grep
andxml_grep2
. -
clacke over 10 yearsI mean
//element/@attribute
, not//element@attribute
. Can't edit it apparently, but leaving it there rather than delete+replace to not confuse the history of this discussion. -
clacke over 10 years
//element[@attribute]
selects elements of typeelement
that have an attributeattribute
. I do not want the element, only the attribute.<element attribute='foo'/>
should give mefoo
, not the full<element attribute='foo'/>
. -
clacke over 10 years... and
--text_only
in that context gives me the empty string in the case of an element like<element attribute='foo'/>
with no text node inside. -
clacke about 10 yearsThat's even narrower criteria than the question, so it definitely fits as an answer. I'm sure many people who ran into your situation will be helped by your research. I'm keeping
xmlstarlet
as the accepted answer, because it fits my wider criteria and it's really neat. But I will probably have use for your solution from time to time. -
kevinarpe about 9 yearsNote: Some older versions of
xmllint
do not support command line argument--xpath
, but most seem to support--shell
. Slight dirtier output, but still useful in a bind. -
tooomg almost 9 yearsI would add that to avoid quotes around the result, use
puts
instead ofp
in the Ruby command. -
clacke over 8 yearsYes, I fell for my own assumption in the question, that XPath implies XML. This answer is a good complement to the others here, and thanks for letting me know about html5lib!
-
FrustratedWithFormsDesigner almost 8 yearsXidel looks pretty cool, though you should probably mention that you are the also the author of this tool that you recommend.
-
Ramakrishnan Kannan almost 8 yearsHow to pass filename?
-
clacke almost 8 yearsThis works on
stdin
. That eliminates the need for includingopen()
andclose()
in an already quite lengthy one-liner. To parse a file just runpython -c "from lxml.etree import parse; from sys import stdin; print '\n'.join(parse(stdin).xpath('//element/@attribute'))" < my_file.xml
and let your shell handle the file lookup, opening and closing. -
igo almost 8 yearsTo use namespace add it to
-qs
like this:'-qs:declare namespace mets="http://www.loc.gov/METS/";/mets:mets/mets:dmdSec'
-
Gilles Quenot almost 8 yearsSaxon and saxon-lint use xpath3 ;)
-
Pysis over 7 yearsI am still seem to have trouble querying for node contents, not an attribute. Can anyone provide an example for that? For some reason, I still find xmlstarlet difficult to figure out and get right between matching, value, root to just view the document structure, and etc.. Even with the first
sel -t -m ... -v ...
example from this page: arstechnica.com/information-technology/2005/11/linux-20051115/2, matching all but the last node and saving that one for the value expression like my use case, I still can't seem to get it, I just get blank output.. -
JJoao over 7 yearsMinor correction "Xml" instead of "xml" :
sudo cpan App::Xml_grep2
-
JonnyRaa over 6 yearsnice one on the version of xpath - I'd just run into this limitation of the otherwise excellent xmllint
-
clacke over 6 yearsI used this today! Our build servers had neither
lxml
norxmllint
, or even Ruby. In the spirit of the format in my own answer, I wrote it aspython3 -c "from xml.etree.ElementTree import parse; from sys import stdin; print(parse(stdin).find('.//element[subelement=\"value\"]/othersubelement').text)" <<< "$variable_containing_xml"
in bash..getroot()
doesn't seem necessary. -
clacke over 6 years
-
JGFMK about 6 yearsXidel (0..8.win32.zip) shows up as having malware on Virustotal. So try at your own risk virustotal.com/#/file/…
-
clacke about 6 yearsxmlstarlet is great, but the accepted and main ranking answer already mentions it. The information about how to handle namespaces might have been relevant as a comment, if at all. Anyone running into issues with namespaces and xmlstarlet can find an excellent discussion in the documentation
-
diemo about 6 yearsSure, @clacke, xmlstarlet has been mentioned several times, but also that it is hard to grasp, and underdocumented. I was guessing around for an hour how to get information out of nested elements. I wish I had had that example, that's why I am posting it here to avoid others that loss of time (and the example is too long for a comment).
-
maoizm over 5 yearsgreat - I am going to add xidel to my personal wrench tool box
-
toon81 over 5 yearsOn my Linux Mint machine (a derivative of Ubuntu/Debian),
xmllint
doesn't come withlibxml2
but withlibxml2-utils
. -
Ivan Chau about 5 years
-
Hubbitus almost 5 yearsPlease look also at stackoverflow.com/questions/41114695/… if you wish use
xmllint
on documents with namespaces -
clacke over 4 yearsAnd you're hosting on sourcehut! Nice!
-
Thomas W about 4 yearsYou might want to specify the classpath, e.g. like
java -classpath /usr/share/java/saxonb.jar net.sf.saxon.Query -s:source.xml -qs://element/@attribute
. -
Vasan almost 4 yearsNice! I had to run a recursive search for XML files with node(s) matching a given xpath query. Used xidel with find like so:
find . -name "*.xml" -printf '%p : ' -exec xidel {} -s -e 'expr' \;
-
Reino over 3 years@Vasan With a lot of xml-files running
xidel
for each and every xml-file is very inefficient! With the EXPath File Modulexidel
can do that much faster:xidel -se 'file:list(.,true(),"*.xml") ! concat(.," : ",doc(.)/{expr})'