Perl, how to parse XML file, xpath

20,302

Solution 1

This review points out that XML::XPath hasn't been updated since 2003, and recommends XML::LibXML instead

use 5.010;
use strict;
use warnings;
use XML::LibXML;

my $dom = XML::LibXML->new->parse_file('data.xml');
for my $node ($dom->findnodes('/category/event/@name')) {
    say $node->toString;
}

See XML::LibXML::Parser and XML::LibXML::Node.

Solution 2

The find method returns an XML::XPath::NodeSet object which is a collection of all the nodes found. I can't imagine what you can have done to see one long string with all of the attribute values.

Having retrieved the set of nodes, you work on its contents with methods like size, get_node and get_nodelist (see the docs I've linked above). get_nodelist will return a Perl list of, in this case, XML::XPath::Node::Attribute objects which also have their own methods. This program should get you started

use strict;
use warnings;

use XML::XPath;

my $xp = XML::XPath->new(ioref => \*DATA);

my $names = $xp->find('/category/event/@name');

for my $node ($names->get_nodelist) {
  say $node->getNodeValue;
}


__DATA__
  <category name="a">
    <event name="cat1" />  
    <event name="cat2" />  
    <event name="cat3" />  
    <event name="cat4" />  
    <event name="cat5" />  
  </category>

OUTPUT

cat1
cat2
cat3
cat4
cat5
Share:
20,302
liverpaul
Author by

liverpaul

Updated on February 03, 2020

Comments

  • liverpaul
    liverpaul over 4 years

    I want to parse an XML file using Perl. I was able to do it using the XML::Simple module, but now I want to start using the XML::XPath module instead because it uses XPath expressions. From my limited knowledge I think XPaths will make future parsing easier, right? Here's the Perl code I have so far:

    use strict;
    use warnings;
    use XML::XPath;
    
    my $file = "data.xml";
    my $path = XML::XPath->new(filename => $file);
    
    my $name = $path->find('/category/event/@name');
    print $name."\n";
    

    My question is how do I separate each name attribute (category/event/@name) so that I can perform tests on each value I parse. At the moment I'm just getting a big string full of the parsed data, whereas I want several small strings that I can test. How can I do this? Thanks :-)

  • Borodin
    Borodin over 12 years
    are you recommending XML::LibXML because you know it better, or because you think it has a genuine advantage over XML::XPath? As far as I know the latter works fine. It is also pure Perl, which makes it slower than LibXML but usable without the help of an external library.
  • daxim
    daxim over 12 years
    That's a hyper-link up there. Do follow it.
  • liverpaul
    liverpaul over 12 years
    Thanks for the reply. After reading the link posted by daxim I've decided to use XML::LibXML instead. It seems to be the best one out there, so as a beginner I think it would be better for me to learn a module that is better documented. I appreciate the introduction info you wrote, it helped me understand things a bit better :-)
  • liverpaul
    liverpaul over 12 years
    @daxim Thanks for the reply. I tried that it it worked, but not 100% the way I wanted. My output is name="attribute_value", but I want just attribute_value. Is there a way to just output the attribute_value without the name=""?
  • liverpaul
    liverpaul over 12 years
    After a bit more research I found that changing the line ´$node->toString´ to ´$node->to_literal´ gives me output of just the attribute_value with no name="". This is what I wanted. If this is a bad way to do things, please tell me, otherwise my question is answered. Thanks again for the help :-)
  • daxim
    daxim over 12 years
    No, calling the documented method to_literal is not a bad thing. - Please mark the answer as accepted.
  • Venkatesh
    Venkatesh almost 9 years
    using XML::XPath, can we use ^ or * inside path?. EX : my $names = $xp->find('/category/eve*');.Inside category, search for tag starting with eve