converting csv to xml with an xsd

26,468

Solution 1

This seems like something that would be easy to do, but it's not. XML Schema is a document validation language, not a document production language. It doesn't tell you how to make a new document; it tells you whether or not the document that you made is valid. Those aren't the same thing by a long shot.

For instance, it's trivial to create a complex type in XML Schema that consists of a sequence of optional choices. A foo element can have either a bar or baz child, then either a baz or bat child, then a foo, bar, or bat child. That makes for a rule that can determine that both of these elements are valid:

<foo>
   <baz/>
   <baz/>
   <bar/>
</foo>

<foo>
   <foo>
      <bar/>
   </foo>
</foo>

At the same time, that rule gives you pretty much zero help in determining how to take a tuple of data items and create a foo element from it.

Generally, when someone asks this question, they're looking at one or two schemas they're using which define a relatively simple document structure. It seems intuitive that it should be easy to use those schemas as input to a mapping process. It probably is. What's not easy, or even possible, is a mapping process that can take any schema as an input.

What I've done instead, in my projects, is to simplify the problem. I've built programs that use CSV and XML and and support schema validation, but in these programs, the schema is an output. I've defined a simple XML metadata format, e.g.:

<item name="foo" type="string" size="10" allowNulls="true" .../>
<item name="bar" type="date" allowNulls="false" .../>

Then I can use that metadata to control XML production from CSV input, and I can also use it to produce a schema that the XML my program produces will conform to. If I change my metadata, my XML and schema changes appropriately.

Of course, if the schemas are genuinely an input to your process (e.g. they're provided by a third party), this won't even start to help you.

Solution 2

Well, I don't really have a ready-made, out-of-the-box solution for this, but maybe:

  • read your CSV file with a library like FileHelphers; for this, you need to create a class MyDataType which describes the columns in the CSV, and you get an array of MyDataType

  • if you decorate that class with the proper XML serialization attributes like [XmlIgnore], [XmlAttribute] and so forth, you might be able to just simply serialize out the resulting array of MyDataType into an XML that conforms to your XML schema

  • or if that doesn't work, you could create another class that maps to your XML requirements (generate it from the XSD you have), and just simply define a mapping between the two types MyDataType (from your CSV) and MyXmlDataType (for your XML) with something like AutoMapper

It's not boiler-plate - but fairly close, and you could possibly make that pretty much a "framework" to just simply plug in your own types (if you need to do this frequently).

Solution 3

Microsoft Excel is able to export XML: http://office.microsoft.com/en-us/excel-help/export-xml-data-HP010206401.aspx

I had some problems with creating an exportable XSD format, but this is a really great tool once you've got it working.

Solution 4

If your XSLT engine is compliant with XSLT version 2, then the best solution is here:

Share:
26,468
Casey
Author by

Casey

Long time developer and hacker currently working as a security consultant in the Application Security field. My focus is on attacking web applications and code audits. I also do mobile application assessments and develop a lot of training around offensive web hacking techniques.

Updated on October 27, 2020

Comments

  • Casey
    Casey over 3 years

    I am trying to find a reusable way of taking a CSV file and generating an XML file from it that conforms to a specified XSD. I haven't really found a reusable approach for this. I have used Altova MapForce which lets me import a CSV file and XSD, do the mapping than generate code from this, but the code needs to be regenerated whenever the XSD changes. Altova also produces a lot of code.

    My ideal solution would be a set of Java classes that I can give a CSV file to, an XSD and get an XML file out of it. I can't find anything like this though and I'm thinking about potentially creating something.

    Ideas? Is there something here using XSLT based on this question?

    Thanks.

  • Casey
    Casey over 14 years
    I like your solution but it is not going to work too well for my current needs. The schema, while not provided by a third party is subject to change (albeit not very often), but the users will always be using an Excel template that we provide them with. I can see a few areas where I think this would be very useful though! Thanks!
  • Sean B. Durkin
    Sean B. Durkin over 12 years
    I don't agree that it is not an easy thing to do. It is an easy thing to do. It is a common problem and it has been solved (for XSLT v2 users).
  • Robert Rossney
    Robert Rossney over 12 years
    Sure, you can go through a schema and, by skipping optional elements and always picking the first option any time there's a choice, you can produce a document that complies with the schema. But a schema can only tell you what XML document to generate from a CSV file if there's other metadata (e.g. headings in the CSV files, and rules for mapping headings to element names, and conventions about optional elements) besides what an XML schema contains. Without that, it's not only not easy, it's not possible.