Sunday, January 23, 2011

XSLT transformation to create a text file from Multiterm XML format

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="utf-8" />

<xsl:template match="/*">
<xsl:for-each select="conceptGrp">
    <xsl:for-each select="languageGrp/termGrp/term">
        <xsl:value-of select="."/><xsl:text>    </xsl:text>
    </xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>


NOTE:
  • Note: the empty space betwen the <xsl:text> tag here is a tab character: <xsl:value-of select="."/><xsl:text> </xsl:text> . It is very important for the structure of the text file.
  • The msxsl processor does not seem to be able to process utf-16 encoded files, but there is a solution, presented below.
So, after you export the xml file from Multiterm, you need to convert its encoding from utf-16 to utf-8 (for example in UltraEdit, NotetabPro 6.1, or even MS Word or OpenOffice Writer) and you also need to change the encoding declaration in the xml file near the top of the file from encoding="utf-16" to encoding="utf-8"
After saving the file you can run the following from the command line:
msxsl multitermglossary.xml mtbl2txt.xsl -o textfile.csv If you don't want to play around with encoding conversions, you need to download and install AltovaXML (it is free), add the path to altovaxml in your system settings so that you can use it anywhere in your file system, and run from the command line:
AltovaXML -xslt1 mtbl2txt.xsl -in multitermglossary.xml -out textfile.csv AltovaXML will recognize that the input file is encoded in utf-16, and will even convert the output file to utf-8, as declared in the transfrormation sheet, in this line:
<xsl:output method="text" encoding="utf-8" />

 Or output as a table:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" version="1.1" encoding="iso-8859-1" />
  <xsl:template match="/">
  <html>
  <body>
      <table border="1">
      <tr bgcolor="#9acd32">
        <th>Terms</th>
              </tr>   
        <xsl:for-each select="mtf/conceptGrp/languageGrp">
        <tr>
            <td>
      <xsl:value-of select="termGrp/term"/>
      </td>
                       </tr></xsl:for-each>
    </table> 
  </body>
  </html>
</xsl:template>
</xsl:stylesheet>

Add this line in your xml file
<?xml-stylesheet type="text/xsl" href="TradosMultiterm.xsl"?>
where TradosMultiterm.xsl is the name of the style sheet.