Monday, February 12, 2018

Symbol codes Romanian and Moldovan

About Romanian and Moldovan

Romanian (spoken in Romania) is a Romance language related to Spanish and French, but uses letters
similar to other languages in Central Europe.
Moldovan (or Moldovan) is a very closely related language spoken in the Republic of Moldava, sometimes written in Cyrilic, but is currently written in the same Western alphabet as Romanian.

Romanian Language Links

Status of Moldovan

In 2013, the Republic of Molodova declared that Romanian and Moldovan were two names for the same language. The term Romanian was to be used for official purposes. (Library of Congress News Watch).
Some people in Moldova may still use the term Moldovan and there are some dialectal differences. For instance, the 2017 site for the President of Moldova uses the term "Moldovan" on his Moldovan Presidential website. However, the general Modovan Government site uses the term "Romanian."

Moldovan Links

Recommended Fonts

Latin-2 (Central European) Encoding

Although these languages use the Western alphabet, Czech and Slovak includes accented letters (e.g. č, š) which may not be found in all fonts.
Note: The term Central European is sometimes used to refer to the languages which use accented letters not common in Western European languages.

Common Fonts

Many common fonts such as Times New Roman, Arial, Helvetica, Comic Sans, Calibri, Cambria, Palatinto and many more do include these characters.

Third Party Fonts

Below are some additional third party Unicode fonts which include Central European characters.
  • SIL Fonts – The SIL has created multiple fonts with IPA characters including:
    • Andika – Designed for new readers. It could be suitable for some students with reading disorders.
    • Doulos SIL – Includes Greek, Cyrillic
    • Charis SIL – Font family and includes Greek, Cyrillic
    • Gentium – From SIL. Very readable
  • Quivira – Modelled on Garamond and includes ancient language, basic Cyrillic/Armenian/Georgian and math/astronomical symbols.
Note: Many fonts designed to include phonetic characters or Greek and Western letters include Central European characters. Additional Central European or Extended Latin fonts may be available online, but users should be sure they are properly encoded fonts before installing them.

Typing Romanian


Microsoft provides keyboard utilities for Central European languages which allow you to type Central European Characters.
Note: Neither the Windows International
Keyboard or ALT code repertoire includes Central European characters.
  1. See detailed keyboard activation instructions for different versions of the Windows operating system.
  2. To see where the critical keys are, go to the Microsoft Keyboard Layouts Page.
  3. You can also input characters from the Character Map. This can be useful if you only need to insert characters into only a few words.


Extended Keyboard Codes

You can activate the Extended Keyboard to input Central European characters.

Mac Extended Codes
V = any vowel, C =any consonant

Breve ă, Ă Option+B, V
Cedille ş, Ş Option+C, C
Circumflex â, Â Option+6, V
Example 1: To input the lower case ŏ (o-breve) hold down the Option key, then the B key. Release both keys then type lowercase o.
Example 2: To input the capital Ŏ, hold down the Option key, then the B key. Release all three keys then type capital O.

Romanian Mac Keyboard Utilities

Apple also has keyboard utilities for most Central European languages. See instructions for activating a Macintosh keyboard for more details.


Web Development Romanian Encoding

Test Sites

If you have your browser configured correctly, the Web sites below should display
the correct characters.
Note: If a site displays gibberish, see the Browser Setup page for debugging information.
If you have your browser configured correctly, the Web sites above
should display Central European letters.

Historical Encodings

Unicode (utf-8) is the preferred encoding for Web sites. However, the following historic encodings may still be encountered.
  • win-1250 (aka "Windows Encoding")
  • iso-8859-2 (aka "Latin-2")
  • iso-8859-1 (Latin 1 w/ alternate spelling) – AVOID

Language Tags

Language Tags allow browsers and other software to process Polish text more efficiently. The following lists codes for Polish and minority languages closely related to Polish.
  • ro (Romanian)
  • ro-RO (Romanian as used in Romania)
  • ro-MD (Romanian as used in Moldova)
  • mo (Moldovan DEPRECATED)

Status of mo Language Code

The code mo was officially deprecated in November 3, 2008. However, it still may be used in some contexts.
For instance, there is an archived Wikipedia page (Moldovan Wikipedia) which was written in the Cyrillic alphabet. In contrast (Romanian Wikipedia) is written in the Western Latin alphabet.

Inserting Unicode Character Codes for HTML

The HTML Entity Codes

Use these codes to input accented letters in HTML. For instance, if you want
to type două you would type două
Be sure the appropriate Encodings and Language Tags are used.
NOTE: Because these are Unicode characters, the formatting may not exactly match that of the surrounding text depending on the browser.
Accented Vowels
Vwl Entity Code
â â (226)
Ă Ă 
Capital A breve
ă ă
A breve
Î Î (206)
î î (238)
Consonants with Cedille
Cns Cedille Consonants
Ş Ş 
Capital S cedille
ş ş

Lower Scedille
Capital T cedille
ţ ţ
Lower T cedille

European Quote Marks

Many modern texts use American style quotes, but if you wish to include European style quote marks, here are the codes. Note that these codes may not work in older browsers.
Entity Codes for Quotation Marks
Sym HTMl Entity Code
« « (left angle)
» » (right angle)
‹ (left single angle)
› (right single angle)
„(bottom quote)
‚(single bottom quote)
“(left curly quote)
‘(left single curly quote)
”(right curly quote)
’(right single curly quote)
– (en dash)
— (em dash)

Romanian Language Links


Central European Computing


Linux is used in the region so a search for specific issues may be useful.

Free Xliff Editors


OmegaT is a Java-based translation tool that supports many file formats, including XLIFF documents.

Open Language Tools - Olanto

The Open Language Tools project provides a Java-based XLIFF editor, along with filters for various file formats.

Qt Linguist

Qt Linguist is the translation tool for the Qt environment. It is designed to work with Qt TS files, but supports also PO and XLIFF documents.


Virtaal is the translation tool of the Translate Toolkit. It is designed to work with PO files, but can also work with XLIFF documents and a number of other formats.

FelixCat XLIFF Translator

(Not open-source, but free) XLIFF Translator is a free XLIFF editor part of the Felix TM system.


Lokalize is a KDE application designed as an XLIFF editor. Lokalize can run under Windows with the whole KDE environment installed. The handbook for Lokalize is at:

brightec Online XLIFF Editor (Web App) (Web)
Other online tools: SmartCat, MateCat, MemSource

Sunday, February 11, 2018

Ghiduri stilistice pentru limba română

Alte ghiduri stilistice

Quality assurance tools for translators

Quality assurance for translation is an important step before delivering a text to the customer. Here are some QA programs:

QA Distiller
— Error Spy Online
Linguistic Toolbox
CheckMate from the Okapi framework
— QA tools in the CAT programs of your choice

Regex to convert every second paragraph mark to TAB

After downloading the TBX from IATE, I managed to get a simple text file with the entries separated by new lines.

norme administrative
optische Datenträger
suporți optici
focuri de artificii

I wanted to get a TSV, a tab-delimited file out of it. Normally I would have opened the file in Word and convert to a table with the paragraph delimiter and then paste the text back into Notepad++, but with RegEx it is simpler and quicker. Make sure you select Regular Expressions in the find-replace mask.

Replace with:

The result is:
Zuständigkeit der Mitgliedstaaten    competența statelor membre ale Uniunii Europene
Verwaltungsvorschriften    norme administrative
optische Datenträger    suporți optici

Thursday, January 25, 2018

Python for regex search and replace

# import the needed modules (re is for regex)
import os, re

# set the working directory for a shortcut

# open the source file and read it
fh = file('file.txt', 'r')
subject =

# create the pattern object. r means the string is send as raw so we don't have to escape our escape characters
pattern = re.compile(r'\(([0-9])*,')
# do the replace
result = pattern.sub("('',", subject)

# write the file
f_out = file('file.txt', 'w')

Python re.sub Examples

See also Python re.match
Example for re.sub() usage in Python


import re result = re.sub(pattern, repl, string, count=0, flags=0);

Simple Examples

num = re.sub(r'abc', '', input) # Delete pattern abc num = re.sub(r'abc', 'def', input) # Replace pattern abc -> def num = re.sub(r'\s+', '\s', input) # Eliminate duplicate whitespaces num = re.sub(r'abc(def)ghi', '\1', input) # Replace a string with a part of itself

Python re.match Examples

See also Python re.sub
Note that re.match() matches from the start of the string. Use when you want to match anywhere in a string.
  • Use if you want to search anywhere inside a string
  • Use re.sub() if you want to substitute substrings.
  • Use re.split() if you want to extract fields when you have a common field separator.


Ad-hoc match

import re result = re.match(pattern, string, flags=0);

Pre-compiled pattern

Use this if you use a pattern multiple times. import re pattern = re.compile('some pattern') result = pattern.match(string [, pos [, end]]);

Simple Examples

result = re.match(r'abc', input) # Check for substring 'abc' result = re.match(r'^\w+$', input) # Ensure string is one word pattern = re.compile('abc') # Same as first example result = pattern.match(input)

Choose between neural and statistical Google translation model

The premium features of the Google Cloud Translation API have been made generally available in the Standard Edition. All users have access to the robust translation features available using the Neural Machine Translation (NMT) model, as well as the capabilities of the Phrase-Based Machine Translation (PBMT) model. There is no difference in pricing between the standard, PBMT model, and the NMT model.
By default, when you make a translation request to the Google Cloud Translation API, your text is translated using the NMT model. If the NMT model is not supported for the requested language translation pair, or if you explicitly request it, the PBMT model is used.
You can specify which model to use for translation by using the model query parameter. Specify base to use the PBMT model, and nmt to use the NMT model.

Friday, January 19, 2018

HTML Codes for Romanian Language Characters

Even if your site is written in English only and does not include multilingual translations, you may need to add Romanian language characters to that site on certain pages or for certain words. The list below includes the HTML codes necessary to use Romanian characters that are not in the standard character set and are not found on a keyboard's keys.
Not all browsers support all these codes (mainly, older browsers may cause problems - newer browsers should be fine), so be sure to test your HTML codes before you use them.
Some Romanian characters may be part of the Unicode character set, so you need to declare that in the head of your documents:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
Here are the different characters you may need to use.
DisplayFriendly CodeNumerical CodeHex CodeDescription
Ă &#258;&#x102;Capital A-breve
ă &#259;&#x103;Lowercase a-breve
 &Acirc;&#194;&#xC2;Capital A-circumflex
â &acirc;&#226;&#xE2;Lowercase a-circumflex
Π&Icirc;&#206;&#xCE;Capital I-circumflex
î &icirc;&#238;&#xEE;Lowercase i-circumflex
Ș &#218;&#xDA;Capital S-comma
ș &#219;&#xDB;Lowercase s-comma
Ş &#350;&#x15E;Capital S-cedilla
ş &#351;&#x15F;Lowercase s-cedilla
Ț &#538;&#x21A;Capital T-comma
ț &#539;&#x21B;Lowercase t-comma
Ţ &#354;&#x162;Capital T-cedilla
ţ &#355;&#x163;Lowercase t-cedilla
Using these characters is simple. In the HTML markup, you would place these special character codes where you want the Romanian character to appear.
These are used similarly to other HTML special character codes that allow you to add characters that are also not found on the traditional keyboard, and therefore cannot be simply typed into the HTML in order to display on a web page.
Remember, these characters codes may be used on an English language website if you need to display a word with one of these characters.
 These characters would also be used in HTML that was actually displaying full Romanian translations, whether you actually coded those pages by hand and had a full Romanian version of the site, or if you used a more automated approach to multilingual webpages and went with a solution like Google Translate.
Romanian special characters:[ăĂ,âÂ,îÎ,şŞ,ţŢ] or (if your browser can't deal with ISO-8859-2 charset).

Unicode (ro)
If you want to use the Unicode (UTF-8) charset, be aware that for older browsers you might have to set View->Encoding to UTF-8 or Central european.

Latin-1 SupplementLatin Extended-ALatin Extended-B
1. First you need to tell the browser to use unicode. You do that by adding the meta line of code after the opening html tag:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


2. In your HTML code you would use "&#NNN", where NNN is the number in the middle row in the above table.
For reference the special chars were retrieved from these UNICODE charts:
3. Text example using UTF-8 Romanian characters:
  • Tudor Arghezi -- O zi
4. Perl script to convert Romanian texts from ISO-8859-2 to UTF-8:
ISO-8859-20. If you can't see the ISO-8859-2 charset correctly, you could upgrade your browser to either the latest Internet Explorer, Netscape, Opera, or Mozilla. Lynx 2.8x (although not supporting Javascript) seems to do at least a decent job at approximating those non-ASCII characters...
1. Add this "meta" tag right after your oppening "html" tag in order to tell the viewer/browser to use the proper charset:

<meta http-equiv="content-type" content="text/html; charset=iso-8859-2">

2. ş does not have a cedilla underneath but a comma (ISO-8859-2 has the Turkish version of "sh"). See Unicode for the correct glyphs.
3. Text examples using ISO-8859-2 Romanian characters:

Romanian Keyboard Control using your browser

Romanian Keyboard Control in Mac OS/X

Open System Preferences and select International icom under the Personal category. Select the Input Menu and scroll down to Romanian. Select it and the "Show input menu in menu bar" checkbox below in order to be able to switch between different keyboard layouts. Close System Preferences. Now, you're able to switch to/from the Romanian keyboard layout using the top menu bar. Typically the special characters map as illustrated below:

Romanian Keyboard Control in Windows

Control Panel->Keyboard ->Language (95/98) ->Add...->Romanian->OK

->Input Locales (NT/XP/2000)
If you also check the Enable indicator on taskbar box on the same Input Locales property page, you'll be able to easily switch between RO and EN by left-click on symbol on the taskbar and choosing the desired keyboard mapping.
This setting allows the following mapping of Romanian characters on a standard QWERTY 101-AT US keyboard:
Pronounciation guide - as close as I can describe it in plain English
  • The pronunciation of ă is similar to the final vowel in the word "mother".
  • â and î are pronounced the same. It's similar to the Russian . Also, î is used only if the first letter in a word, and â only if it's not the first letter in a word. Here is a lame sound clip of me saying "Câmpina cânt" (it sounds like KHM-PE-NAH KHNT).
  • The last 2 special characters(ş and ţ) displayed are pronounced as "sh" and "ts" in English. 

The below list includes HTML character codes in Romanian which are usually not found in the standard character set. Should you wish to produce any of the unique characters below from the ASCII Romanian Library, use the respective codes when writing your HTML code.
In order to use these characters, simply place the unique character code where you'd like the character to appear. Codes as such play a key part in translation projects.

Capital A-breve

Lowercase a-breve
Capital A-circumflex
Lowercase a-circumflex
Capital I-circumflex
Lowercase i-circumflex

Capital S-comma

Lowercase s-comma

Capital S-cedilla

Lowercase s-cedilla

Capital T-comma

Lowercase t-comma

Capital T-cedilla

Lowercase t-cedilla

Monday, January 15, 2018

Batch find and replace non-breaking spaces

About the non-breakable space, you can use this string in the Find what box:
(\d)[followed by a space]

and this string in the Replace with box:
$1[followed by non-breakable space, i.e., ALT+0160]

Note: Texts in brackets and square brackets themselves ([])are just descriptive. You have to remove them and replace them with spaces that I can't show here.

Tick Use and select Regular expressions.

The press the Replace button for each instance with wish to get replaced.

Of course, you can try with other search and replace patterns, like:
Find what: (\d) (mg)
Replace with: $1[ALT+0160]$2

Non-breaking hyphen:
Alt +2011

Saturday, January 13, 2018

Microsoft Serialization.xsd Schema for API response

<?xml version="1.0" encoding="utf-8"?>
 <xs:schema xmlns:tns="" attributeFormDefault="qualified" elementFormDefault="qualified" targetNamespace="" xmlns:xs="">
   <xs:element name="anyType" nillable="true" type="xs:anyType" />
   <xs:element name="anyURI" nillable="true" type="xs:anyURI" />
   <xs:element name="base64Binary" nillable="true" type="xs:base64Binary" />
   <xs:element name="boolean" nillable="true" type="xs:boolean" />
   <xs:element name="byte" nillable="true" type="xs:byte" />
   <xs:element name="dateTime" nillable="true" type="xs:dateTime" />
   <xs:element name="decimal" nillable="true" type="xs:decimal" />
   <xs:element name="double" nillable="true" type="xs:double" />
   <xs:element name="float" nillable="true" type="xs:float" />
   <xs:element name="int" nillable="true" type="xs:int" />
   <xs:element name="long" nillable="true" type="xs:long" />
   <xs:element name="QName" nillable="true" type="xs:QName" />
   <xs:element name="short" nillable="true" type="xs:short" />
   <xs:element name="string" nillable="true" type="xs:string" />
   <xs:element name="unsignedByte" nillable="true" type="xs:unsignedByte" />
   <xs:element name="unsignedInt" nillable="true" type="xs:unsignedInt" />
   <xs:element name="unsignedLong" nillable="true" type="xs:unsignedLong" />
   <xs:element name="unsignedShort" nillable="true" type="xs:unsignedShort" />
   <xs:element name="char" nillable="true" type="tns:char" />
   <xs:simpleType name="char">
     <xs:restriction base="xs:int" />
   <xs:element name="duration" nillable="true" type="tns:duration" />
   <xs:simpleType name="duration">
     <xs:restriction base="xs:duration">
       <xs:pattern value="\-?P(\d*D)?(T(\d*H)?(\d*M)?(\d*(\.\d*)?S)?)?" />
       <xs:minInclusive value="-P10675199DT2H48M5.4775808S" />
       <xs:maxInclusive value="P10675199DT2H48M5.4775807S" />
   <xs:element name="guid" nillable="true" type="tns:guid" />
   <xs:simpleType name="guid">
     <xs:restriction base="xs:string">
       <xs:pattern value="[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}" />
   <xs:attribute name="FactoryType" type="xs:QName" />
   <xs:attribute name="Id" type="xs:ID" />
   <xs:attribute name="Ref" type="xs:IDREF" />