Sunday, February 11, 2018

Regex to convert every second paragraph mark to TAB

After downloading the TBX from IATE, I managed to get a simple text file with the entries separated by new lines.

Verwaltungsvorschriften
norme administrative
optische Datenträger
suporți optici
Feuerwerkskörper
focuri de artificii

I wanted to get a TSV, a tab-delimited file out of it. Normally I would have opened the file in Word and convert to a table with the paragraph delimiter and then paste the text back into Notepad++, but with RegEx it is simpler and quicker. Make sure you select Regular Expressions in the find-replace mask.

Find:
\r\n(.*)\r\n
Replace with:
\t$1\r\n

The result is:
Zuständigkeit der Mitgliedstaaten    competența statelor membre ale Uniunii Europene
Verwaltungsvorschriften    norme administrative
optische Datenträger    suporți optici