Monday, February 12, 2018

Symbol codes Romanian and Moldovan

About Romanian and Moldovan

Romanian (spoken in Romania) is a Romance language related to Spanish and French, but uses letters
similar to other languages in Central Europe.
Moldovan (or Moldovan) is a very closely related language spoken in the Republic of Moldava, sometimes written in Cyrilic, but is currently written in the same Western alphabet as Romanian.

Romanian Language Links

Status of Moldovan

In 2013, the Republic of Molodova declared that Romanian and Moldovan were two names for the same language. The term Romanian was to be used for official purposes. (Library of Congress News Watch).
Some people in Moldova may still use the term Moldovan and there are some dialectal differences. For instance, the 2017 site for the President of Moldova uses the term "Moldovan" on his Moldovan Presidential website. However, the general Modovan Government site uses the term "Romanian."

Moldovan Links

Recommended Fonts

Latin-2 (Central European) Encoding

Although these languages use the Western alphabet, Czech and Slovak includes accented letters (e.g. č, š) which may not be found in all fonts.
Note: The term Central European is sometimes used to refer to the languages which use accented letters not common in Western European languages.

Common Fonts

Many common fonts such as Times New Roman, Arial, Helvetica, Comic Sans, Calibri, Cambria, Palatinto and many more do include these characters.

Third Party Fonts

Below are some additional third party Unicode fonts which include Central European characters.
  • SIL Fonts – The SIL has created multiple fonts with IPA characters including:
    • Andika – Designed for new readers. It could be suitable for some students with reading disorders.
    • Doulos SIL – Includes Greek, Cyrillic
    • Charis SIL – Font family and includes Greek, Cyrillic
    • Gentium – From SIL. Very readable
  • Quivira – Modelled on Garamond and includes ancient language, basic Cyrillic/Armenian/Georgian and math/astronomical symbols.
Note: Many fonts designed to include phonetic characters or Greek and Western letters include Central European characters. Additional Central European or Extended Latin fonts may be available online, but users should be sure they are properly encoded fonts before installing them.

Typing Romanian

Windows

Microsoft provides keyboard utilities for Central European languages which allow you to type Central European Characters.
Note: Neither the Windows International
Keyboard or ALT code repertoire includes Central European characters.
  1. See detailed keyboard activation instructions for different versions of the Windows operating system.
  2. To see where the critical keys are, go to the Microsoft Keyboard Layouts Page.
  3. You can also input characters from the Character Map. This can be useful if you only need to insert characters into only a few words.

Macintosh

Extended Keyboard Codes

You can activate the Extended Keyboard to input Central European characters.

Mac Extended Codes
V = any vowel, C =any consonant

ACCENT SAMPLE TEMPLATE
Breve ă, Ă Option+B, V
Cedille ş, Ş Option+C, C
Circumflex â, Â Option+6, V
Example 1: To input the lower case ŏ (o-breve) hold down the Option key, then the B key. Release both keys then type lowercase o.
Example 2: To input the capital Ŏ, hold down the Option key, then the B key. Release all three keys then type capital O.

Romanian Mac Keyboard Utilities

Apple also has keyboard utilities for most Central European languages. See instructions for activating a Macintosh keyboard for more details.

 

Web Development Romanian Encoding

Test Sites

If you have your browser configured correctly, the Web sites below should display
the correct characters.
Note: If a site displays gibberish, see the Browser Setup page for debugging information.
If you have your browser configured correctly, the Web sites above
should display Central European letters.

Historical Encodings

Unicode (utf-8) is the preferred encoding for Web sites. However, the following historic encodings may still be encountered.
  • win-1250 (aka "Windows Encoding")
  • iso-8859-2 (aka "Latin-2")
  • iso-8859-1 (Latin 1 w/ alternate spelling) – AVOID

Language Tags

Language Tags allow browsers and other software to process Polish text more efficiently. The following lists codes for Polish and minority languages closely related to Polish.
  • ro (Romanian)
  • ro-RO (Romanian as used in Romania)
  • ro-MD (Romanian as used in Moldova)
  • mo (Moldovan DEPRECATED)

Status of mo Language Code

The code mo was officially deprecated in November 3, 2008. However, it still may be used in some contexts.
For instance, there is an archived Wikipedia page mo.wikipedia.org (Moldovan Wikipedia) which was written in the Cyrillic alphabet. In contrast ro.wikepedia.org (Romanian Wikipedia) is written in the Western Latin alphabet.

Inserting Unicode Character Codes for HTML

The HTML Entity Codes

Use these codes to input accented letters in HTML. For instance, if you want
to type două you would type două
Be sure the appropriate Encodings and Language Tags are used.
NOTE: Because these are Unicode characters, the formatting may not exactly match that of the surrounding text depending on the browser.
Accented Vowels
Vwl Entity Code
  (194)
â â (226)
Ă Ă 
Capital A breve
ă ă
Lower
A breve
Î Î (206)
î î (238)
Consonants with Cedille
Cns Cedille Consonants
Ş Ş 
Capital S cedille
ş ş

Lower Scedille
Ţ Ţ
Capital T cedille
ţ ţ
Lower T cedille
 

European Quote Marks

Many modern texts use American style quotes, but if you wish to include European style quote marks, here are the codes. Note that these codes may not work in older browsers.
Entity Codes for Quotation Marks
Sym HTMl Entity Code
« « (left angle)
» » (right angle)
‹ (left single angle)
› (right single angle)
„(bottom quote)
‚(single bottom quote)
“(left curly quote)
‘(left single curly quote)
”(right curly quote)
’(right single curly quote)
– (en dash)
— (em dash)

Romanian Language Links

Moldovan

Central European Computing

Linux/Unix

Linux is used in the region so a search for specific issues may be useful.

Free Xliff Editors

OmegaT

OmegaT is a Java-based translation tool that supports many file formats, including XLIFF documents.

Open Language Tools - Olanto

The Open Language Tools project provides a Java-based XLIFF editor, along with filters for various file formats.

Qt Linguist

Qt Linguist is the translation tool for the Qt environment. It is designed to work with Qt TS files, but supports also PO and XLIFF documents.

Virtaal

Virtaal is the translation tool of the Translate Toolkit. It is designed to work with PO files, but can also work with XLIFF documents and a number of other formats.

FelixCat XLIFF Translator

(Not open-source, but free) XLIFF Translator is a free XLIFF editor part of the Felix TM system.

Lokalize

Lokalize is a KDE application designed as an XLIFF editor. Lokalize can run under Windows with the whole KDE environment installed. The handbook for Lokalize is at: http://docs.kde.org/development/en/kdesdk/lokalize/index.html

Poedit

Beginning with version 2.2, Poedit also supports the version 1.2 and the new XLIFF 2 format, both in the free and open-source version and the PRO paid version.

Ocelot

Xliffie

Xliff-translator-tool

brightec Online XLIFF Editor (Web App)

www.beyondf.com/tranzapp/translate (Web)

Weblate

Heartsome

Other online tools: SmartCat, MateCat, MemSource

Sunday, February 11, 2018

Ghiduri stilistice pentru limba română

Alte ghiduri stilistice


Quality assurance tools for translators

Quality assurance for translation is an important step before delivering a text to the customer. Here are some QA programs:

Xbench
QA Distiller
Verifika
ErrorSpy
— Error Spy Online
Linguistic Toolbox
TQAUDITOR
CheckMate from the Okapi framework
— QA tools in the CAT programs of your choice

Regex to convert every second paragraph mark to TAB

After downloading the TBX from IATE, I managed to get a simple text file with the entries separated by new lines.

Verwaltungsvorschriften
norme administrative
optische Datenträger
suporți optici
Feuerwerkskörper
focuri de artificii

I wanted to get a TSV, a tab-delimited file out of it. Normally I would have opened the file in Word and convert to a table with the paragraph delimiter and then paste the text back into Notepad++, but with RegEx it is simpler and quicker. Make sure you select Regular Expressions in the find-replace mask.

Find:
\r\n(.*)\r\n
Replace with:
\t$1\r\n

The result is:
Zuständigkeit der Mitgliedstaaten    competența statelor membre ale Uniunii Europene
Verwaltungsvorschriften    norme administrative
optische Datenträger    suporți optici