Sunday, January 23, 2011

Word macro to convert tab-delimited csv to Trados Multiterm xml format

As the name implies, this word macro helps converting tab-delimited csv to Trados Multiterm xml format.
So, the input format should be: TERM<TAB>TERM<ENTER>. There should be no tags: "<" or ">". Anyway, the macro checks if there are some tags and ends the routine if any tags are found. Beware: when pasting into Word do not leave any paragraph marks without terms, especially at the end of the file.
The output format should be a simple Multiterm xml format. Of course, source and target language may vary:

Output structure should look like this:
<mtf>
<conceptGrp>
<languageGrp>
<language type="Deutsch" lang="DE" />
<termGrp><term>Deutsch</term></termGrp>
</languageGrp>
<languageGrp>
<language type="Romanian" lang="RO" />
<termGrp><term>Română</term></termGrp>
</languageGrp>
</conceptGrp>
</mtf>

Word macro:

Sub csv2multiterm()
    Selection.HomeKey Unit:=wdStory
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = "^p^p"
        .Replacement.Text = "^p"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    Selection.Find.ClearFormatting
    With Selection.Find
        .Text = "<"
        .Replacement.Text = "
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    If Selection.Find.Found = True Then
    MsgBox ""<" detected! Delete tags and proceed"
    End
    End If
    Selection.Find.Execute
    With Selection.Find
        .Text = ">"
        .Replacement.Text = "
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    If Selection.Find.Found = True Then
    MsgBox "">" detected! Delete tags and proceed"
    End
    End If
    Selection.HomeKey Unit:=wdStory
    Selection.TypeParagraph
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = "^p"
        .Replacement.Text = _
            "</term></termGrp>^p</languageGrp>^p</conceptGrp>^p<conceptGrp>^p<languageGrp>^p<language type="Deutsch" lang="DE" />^p<termGrp><term>"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = "^t"
        .Replacement.Text = _
            "</term></termGrp>^p</languageGrp>^p<languageGrp>^p<language type="Romanian" lang="RO" />^p<termGrp><term>"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    Selection.HomeKey Unit:=wdStory
    Selection.MoveDown Unit:=wdLine, Count:=3, Extend:=wdExtend
    Selection.Delete
    Selection.EndKey Unit:=wdStory
    Selection.MoveUp Unit:=wdLine, Count:=3, Extend:=wdExtend
    Selection.HomeKey Unit:=wdLine, Extend:=wdExtend
    Selection.Delete
    Selection.HomeKey Unit:=wdStory
    Selection.TypeText ("<mtf>")
    Selection.TypeParagraph
    Selection.EndKey Unit:=wdStory
    Selection.TypeText ("</mtf>")
End Sub

After that, you may replace the source and target language of your choice, for instance:
<language type="Deutsch" lang="DE" />
with <language type="English" lang="EN" />
and target language
<language type="Romanian" lang="RO" />
with <language type="English" lang="EN" />