How to convert data stored in XML files into a relational database (MySQL)?

12,493

Solution 1

Databases are not the only way to search data. I can highly recommend Apache Solr

Keep your raw data as XML and search it using the Solr index

Solution 2

Importing XML files of the right format into a MySql database is easy:

https://dev.mysql.com/doc/refman/5.6/en/load-xml.html

This means, you typically have to transform your XML data into that kind of format. How you do this depends on the complexity of the transformation, what programming languages you know, and if you want to use XSLT (which is most probably a good idea).

From your former answers it seems you know Python, so http://xmlsoft.org/XSLT/python.html may be the right thing for you to start with.

Solution 3

Take a look at StAX instead of XSD for analyzing/extraction of data. It's stream based and can deal with huge XML files.

Solution 4

If you feel comfortable with Perl, I've had pretty good luck with XML::Twig module for processing really big XML files.

Basically, all you need is to setup few twig handlers and import your data into MySQL using DBI/DBD::mysql.

There is pretty good example on xmltwig.org.

Solution 5

If you comfortable with commercial products, you might want to have a look at Data Wizard for MySQL by the SQL Maestro Group.

This application is targeted especially at exporting and, of course, importing data from/ to MySQL databases. This also includes XML import. You can download a 30-day trial to check if this is what you are looking for.

I have to admit that I did not use the MySQL product line from them yet, but I had a good user experience with their Firebird Maestro and SQLite Maestro products.

Share:
12,493
E.Z.
Author by

E.Z.

.

Updated on June 04, 2022

Comments

  • E.Z.
    E.Z. almost 2 years

    I have a few XML files containing data for a research project which I need to run some statistics on. The amount of data is close to 100GB.

    The structure is not so complex (could be mapped to perhaps 10 tables in a relational model), and given the nature of the problem, this data will never be updated again, I only need it available in a place where it's easy to run queries on.

    I've read about XML databases, and the possibility of running XPATH-style queries on it, but I never used them and I'm not so comfortable with it. Having the data in a relational database would be my preferred choice.

    So, I'm looking for a way to covert the data stored in XML into a relational database (think of a big .sql file similar to the one generated by mysqldump, but anything else would do). The ultimate goal is to be able to run SQL queries for crunching the data.

    After some research I'm almost convinced I have to write it on my own. But I feel this is a common problem, and therefore there should be a tool which already does that.

    So, do you know of any tool that would transform XML data into a relational database?

    PS1:

    My idea would be something like (it can work differently, but just to make sure you get my point):

    1. Analyse the data structure (based on the XML themselves, or on a XSD)
    2. Build the relational database (tables, keys) based on that structure
    3. Generate SQL statements to create the database
    4. Generate SQL statements to create fill in the data

    PS2:

    I've seen some posts here in SO but still I couldn't find a solution. Microsoft's "Xml Bulk Load" tool seems to do something in that direction, but I don't have a MS SQL Server.

  • E.Z.
    E.Z. over 11 years
    Didn't know about it. Will definetely give it a try. Thanks!