Showing posts with label Convert. Show all posts
Showing posts with label Convert. Show all posts

Sunday, March 12, 2023

Conversion tools and difference checkers

Conversion tools:
TBX convert: On this page, you can convert between several glossary filetypes: UTX-Simple, GlossML, TBXGlossary,
OLIF. TBX (TermBase eXchange) is a family of XML-based languages for the interchange of
terminological information (called TMLs, for Terminological Markup Language; also informally called “dialects” of TBX). All of TBX shares a core structure, in which information is represented on one of three structural levels: concept, language, and term.
UTF-16 to UTF-8 Converter
Glossary converter allows to convert between MultiTerm Termbases and other terminology formats by simple drag and drop, with minimal user interaction. It supports xls, xlsx, csv, txt, tbx, utx, multiterm export files and tmx.
TBX Utilities: This is a collection of tools to be used in working with Term Base eXchange (TBX); an open, XML based standard for exchanging structured terminological data submitted for adoption under ISO 30042 Technical Committee 37.
TBX Resources: TBX Resources is dedicated to helping you use the industry-standard TBX format with your terminological data. Here you’ll find tutorials and tools for using and converting to and from TBX.
Other TBX downloads and tools
Converting TBX files to XLS/CSV format
TXT
AntFile Converter: A freeware tool to convert PDF and Word (DOCX) files into plain text for use in corpus tools like AntConc.
EncodeAnt is a freeware character encoding detection and conversion tool. EncodeAnt takes an input list of text files (e.g. .txt) and attempts to auto-detect the character encoding that the files use. The character encoding can also be set manually. EncodeAnt also has an option to auto-convert the character encoding of the files to UTF-8, which is a standard used in most corpus research. The converted files are saved in a separate folder leaving the original files untouched.
Difference checkers:
Winmerge.org: WinMerge is an Open Source differencing and merging tool for Windows. WinMerge can
compare both folders and files, presenting differences in a visual text format that is easy to understand and handle.
DiffEngineX is a fast and scalable compare utility that finds the differences between the formulae, constants, defined names, cell comments and Visual Basic VBA code contained in either two whole Excel workbooks or selected worksheets on Windows. It can align similar rows and columns across two different Excel spreadsheets. It works with xls, xlsx, xlsm and xlsb files. xla and xlam add-ins need to be converted first into xls and xlsm files before DiffEngineX can compare them. Excel 2003, 2007, 2010 or 2013 is required for this spreadsheet comparison tool to work.
ExcelDiff analyzes multiple Microsoft Excel(.csv, .xls, .xlsx, .xlsm, .xlsb) files and shows their differences graphically, even clarifies cell-level.
KDiff3

Source: inmyownterms.com

Sunday, February 11, 2018

Regex to convert every second paragraph mark to TAB

After downloading the TBX from IATE, I managed to get a simple text file with the entries separated by new lines.

Verwaltungsvorschriften
norme administrative
optische Datenträger
suporți optici
Feuerwerkskörper
focuri de artificii

I wanted to get a TSV, a tab-delimited file out of it. Normally I would have opened the file in Word and convert to a table with the paragraph delimiter and then paste the text back into Notepad++, but with RegEx it is simpler and quicker. Make sure you select Regular Expressions in the find-replace mask.

Find:
\r\n(.*)\r\n
Replace with:
\t$1\r\n

The result is:
Zuständigkeit der Mitgliedstaaten    competența statelor membre ale Uniunii Europene
Verwaltungsvorschriften    norme administrative
optische Datenträger    suporți optici