Split XML file into multiple files

7,033

First off - I'll say I think it's quite a bad idea to do XML parsing with anything other than an XML parser. Regular expressions may look like they're going to work, but this is a really good way to make some brittle code - XML that's semantically equivalent can look different to different REs (such as indents/linefeeds and unary tags).

So with that in mind - I would use Perl and the XML::Twig library. This is a pretty standard thing - there are prebuilt packages ubiquitously available.

However perhaps most importantly of all - the XML you have posted is NOT valid. I'm going to assume that's because it's a sample, and not the real XML, and so you've missed a bit off. I'm using as my sample:

<root>
<unix>
 <mm />
</unix>
<osx>
 <nn />
</osx>
</root>

And using this code will do what you ask for:

#!/usr/bin/env perl

use strict;
use warnings;

use XML::Twig;

my $twig = XML::Twig->new( 'pretty_print' => 'indented' );
$twig->parsefile("your_xml.xml");

foreach my $element ( $twig->root->children ) {
    my $tag = $element->tag;
    print "Processing $tag\n";

    #print to STDOUT for debugging
    print $element ->sprint;

    #print to output file
    open( my $output, ">", "$tag.xml" ) or warn $!;
    print {$output} $element->sprint;
    close($output);
}

If of course, your posting of XML is literally what you have, then it is broken XML and you should ideally go and hit whoever gave you it a with a rolled up copy of the spec document. If that is impractical due to it being real life, then I would offer you this answer on Stack Overflow: https://stackoverflow.com/a/28913945/2566198

Share:
7,033

Related videos on Youtube

DisplayName
Author by

DisplayName

Updated on September 18, 2022

Comments

  • DisplayName
    DisplayName over 1 year

    I have an xml file that have different nodes, I want to split files like this:

    <unix>
     <mm>
    </unix>
    <osx>
     <nn>
    </osx>
    

    When I run the script I want it to make one xml file called unix.xml, which contains this

    <unix
     <mm>
    </unix>
    

    And then another xml file called osx.xml, which contains this

    <osx>
     <nn>
    </osx>
    
    • minorcaseDev
      minorcaseDev over 9 years
      This is no valid XML. An XML file has one root tag.