Parse text file with XSLT

17,754

Solution 1

If you can use XSLT 2.0 you could use unparsed-text()...

Text File (Do not use the text file as direct input to the XSLT.)

!ITEM_NAME
Item value
!ANOTHER_ITEM
Its value
!TEST_BANG
Here's a value with !bangs!!!

XSLT 2.0 (Apply this XSLT to itself (use the stylesheet as the XML input). You'll also have to change the path to your text file. You might have to change the encoding too.)

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:param name="text-encoding" as="xs:string" select="'iso-8859-1'"/>
    <xsl:param name="text-uri" as="xs:string" select="'file:///C:/Users/dhaley/Desktop/test.txt'"/>

    <xsl:template name="text2xml">
        <xsl:variable name="text" select="unparsed-text($text-uri, $text-encoding)"/>
        <xsl:analyze-string select="$text" regex="!(.*)\n(.*)">
            <xsl:matching-substring>
                <xsl:element name="{normalize-space(regex-group(1))}">
                    <xsl:value-of select="normalize-space(regex-group(2))"/>
                </xsl:element>
            </xsl:matching-substring>
        </xsl:analyze-string>
    </xsl:template>

    <xsl:template match="/">
        <document>
            <xsl:choose>
                <xsl:when test="unparsed-text-available($text-uri, $text-encoding)">
                    <xsl:call-template name="text2xml"/>                                
                </xsl:when>
                <xsl:otherwise>
                    <xsl:variable name="error">
                        <xsl:text>Error reading "</xsl:text>
                        <xsl:value-of select="$text-uri"/>
                        <xsl:text>" (encoding "</xsl:text>
                        <xsl:value-of select="$text-encoding"/>
                        <xsl:text>").</xsl:text>
                    </xsl:variable>
                    <xsl:message><xsl:value-of select="$error"/></xsl:message>
                    <xsl:value-of select="$error"/>
                </xsl:otherwise>
            </xsl:choose>
        </document>
    </xsl:template>
</xsl:stylesheet>

XML Output

<document>
   <ITEM_NAME>Item value</ITEM_NAME>
   <ANOTHER_ITEM>Its value</ANOTHER_ITEM>
   <TEST_BANG>Here's a value with !bangs!!!</TEST_BANG>
</document>

Solution 2

This XSLT 2.0 transformation:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:variable name="vText" select=
 "replace(unparsed-text('file:///c:/temp/delete/text.txt'),'\r','')"/>

 <xsl:template match="/">
  <document>
      <xsl:analyze-string select="$vText" regex="(!(.+?)\n([^\n]+))+">
       <xsl:matching-substring>
         <xsl:element name="{regex-group(2)}">
                <xsl:sequence select="regex-group(3)"/>
         </xsl:element>
       </xsl:matching-substring>
       <xsl:non-matching-substring><xsl:sequence select="."/></xsl:non-matching-substring>
      </xsl:analyze-string>
  </document>
 </xsl:template>
</xsl:stylesheet>

when appliedon any XML document (not used) and having the provided text residing in the local file C:\temp\delete\Text.txt:

!ITEM_NAME
Item value
!ANOTHER_ITEM
Its value
...

produces the wanted, correct result:

<document>
   <ITEM_NAME>Item value</ITEM_NAME>
   <ANOTHER_ITEM>Its value</ANOTHER_ITEM>
...
</document>

To test more completely, we put this text in the file:

As is text
!ITEM_NAME
Item value
!ANOTHER_ITEM
Its value
As is text2
!TEST_BANG
Here's a value with !bangs!!!
!TEST2_BANG
 !!!Here's a value with !more~ !bangs!!!
As is text3

The transformation again produces the wanted, correct result:

<document>As is text
<ITEM_NAME>Item value</ITEM_NAME>
<ANOTHER_ITEM>Its value</ANOTHER_ITEM>
As is text2
<TEST_BANG>Here's a value with !bangs!!!</TEST_BANG>
<TEST2_BANG> !!!Here's a value with !more~ !bangs!!!</TEST2_BANG>
As is text3
</document>
Share:
17,754
sblandin
Author by

sblandin

Updated on June 04, 2022

Comments

  • sblandin
    sblandin almost 2 years

    I have a plain text file structured like this:

    !ITEM_NAME
    Item value
    !ANOTHER_ITEM
    Its value
    ...
    

    Is it possible to get with XSLT a file similar to:

    <?xml version="1.0" encoding="UTF-8" ?>
    <document>
      <ITEM_NAME>Item value</ITEM_NAME>
      <ANOTHER_ITEM>Its value</ANOTHER_ITEM>
      ...
    </document>
    

    EDIT

    I am sorry I haven't clearly stated before. I am trying to accomplish this transformation with the Visual Studio 2005 XSLT engine. I have tried both of the provided solutions, and I am sure that are correct. But Visual Studio 2005 doesn't know the unparsed-text function.