Friday, January 19, 2018

HTML Codes for Romanian Language Characters

Even if your site is written in English only and does not include multilingual translations, you may need to add Romanian language characters to that site on certain pages or for certain words. The list below includes the HTML codes necessary to use Romanian characters that are not in the standard character set and are not found on a keyboard's keys.
Not all browsers support all these codes (mainly, older browsers may cause problems - newer browsers should be fine), so be sure to test your HTML codes before you use them.
Some Romanian characters may be part of the Unicode character set, so you need to declare that in the head of your documents:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
Here are the different characters you may need to use.
DisplayFriendly CodeNumerical CodeHex CodeDescription
Ă &#258;&#x102;Capital A-breve
ă &#259;&#x103;Lowercase a-breve
 &Acirc;&#194;&#xC2;Capital A-circumflex
â &acirc;&#226;&#xE2;Lowercase a-circumflex
Π&Icirc;&#206;&#xCE;Capital I-circumflex
î &icirc;&#238;&#xEE;Lowercase i-circumflex
Ș &#218;&#xDA;Capital S-comma
ș &#219;&#xDB;Lowercase s-comma
Ş &#350;&#x15E;Capital S-cedilla
ş &#351;&#x15F;Lowercase s-cedilla
Ț &#538;&#x21A;Capital T-comma
ț &#539;&#x21B;Lowercase t-comma
Ţ &#354;&#x162;Capital T-cedilla
ţ &#355;&#x163;Lowercase t-cedilla
Using these characters is simple. In the HTML markup, you would place these special character codes where you want the Romanian character to appear.
These are used similarly to other HTML special character codes that allow you to add characters that are also not found on the traditional keyboard, and therefore cannot be simply typed into the HTML in order to display on a web page.
Remember, these characters codes may be used on an English language website if you need to display a word with one of these characters.
 These characters would also be used in HTML that was actually displaying full Romanian translations, whether you actually coded those pages by hand and had a full Romanian version of the site, or if you used a more automated approach to multilingual webpages and went with a solution like Google Translate.
Romanian special characters:[ăĂ,âÂ,îÎ,şŞ,ţŢ] or (if your browser can't deal with ISO-8859-2 charset).

Unicode (ro)
If you want to use the Unicode (UTF-8) charset, be aware that for older browsers you might have to set View->Encoding to UTF-8 or Central european.

Latin-1 SupplementLatin Extended-ALatin Extended-B
1. First you need to tell the browser to use unicode. You do that by adding the meta line of code after the opening html tag:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


2. In your HTML code you would use "&#NNN", where NNN is the number in the middle row in the above table.
For reference the special chars were retrieved from these UNICODE charts:
3. Text example using UTF-8 Romanian characters:
  • Tudor Arghezi -- O zi
4. Perl script to convert Romanian texts from ISO-8859-2 to UTF-8:
ISO-8859-20. If you can't see the ISO-8859-2 charset correctly, you could upgrade your browser to either the latest Internet Explorer, Netscape, Opera, or Mozilla. Lynx 2.8x (although not supporting Javascript) seems to do at least a decent job at approximating those non-ASCII characters...
1. Add this "meta" tag right after your oppening "html" tag in order to tell the viewer/browser to use the proper charset:

<meta http-equiv="content-type" content="text/html; charset=iso-8859-2">

2. ş does not have a cedilla underneath but a comma (ISO-8859-2 has the Turkish version of "sh"). See Unicode for the correct glyphs.
3. Text examples using ISO-8859-2 Romanian characters:

Romanian Keyboard Control using your browser

Romanian Keyboard Control in Mac OS/X

Open System Preferences and select International icom under the Personal category. Select the Input Menu and scroll down to Romanian. Select it and the "Show input menu in menu bar" checkbox below in order to be able to switch between different keyboard layouts. Close System Preferences. Now, you're able to switch to/from the Romanian keyboard layout using the top menu bar. Typically the special characters map as illustrated below:

Romanian Keyboard Control in Windows

Control Panel->Keyboard ->Language (95/98) ->Add...->Romanian->OK

->Input Locales (NT/XP/2000)
If you also check the Enable indicator on taskbar box on the same Input Locales property page, you'll be able to easily switch between RO and EN by left-click on symbol on the taskbar and choosing the desired keyboard mapping.
This setting allows the following mapping of Romanian characters on a standard QWERTY 101-AT US keyboard:
Pronounciation guide - as close as I can describe it in plain English
  • The pronunciation of ă is similar to the final vowel in the word "mother".
  • â and î are pronounced the same. It's similar to the Russian . Also, î is used only if the first letter in a word, and â only if it's not the first letter in a word. Here is a lame sound clip of me saying "Câmpina cânt" (it sounds like KHM-PE-NAH KHNT).
  • The last 2 special characters(ş and ţ) displayed are pronounced as "sh" and "ts" in English. 

The below list includes HTML character codes in Romanian which are usually not found in the standard character set. Should you wish to produce any of the unique characters below from the ASCII Romanian Library, use the respective codes when writing your HTML code.
In order to use these characters, simply place the unique character code where you'd like the character to appear. Codes as such play a key part in translation projects.

Capital A-breve

Lowercase a-breve
Capital A-circumflex
Lowercase a-circumflex
Capital I-circumflex
Lowercase i-circumflex

Capital S-comma

Lowercase s-comma

Capital S-cedilla

Lowercase s-cedilla

Capital T-comma

Lowercase t-comma

Capital T-cedilla

Lowercase t-cedilla

Monday, January 15, 2018

Batch find and replace non-breaking spaces

About the non-breakable space, you can use this string in the Find what box:
(\d)[followed by a space]

and this string in the Replace with box:
$1[followed by non-breakable space, i.e., ALT+0160]

Note: Texts in brackets and square brackets themselves ([])are just descriptive. You have to remove them and replace them with spaces that I can't show here.

Tick Use and select Regular expressions.

The press the Replace button for each instance with wish to get replaced.

Of course, you can try with other search and replace patterns, like:
Find what: (\d) (mg)
Replace with: $1[ALT+0160]$2

Non-breaking hyphen:
Alt +2011

Saturday, January 13, 2018

Microsoft Serialization.xsd Schema for API response

<?xml version="1.0" encoding="utf-8"?>
 <xs:schema xmlns:tns="" attributeFormDefault="qualified" elementFormDefault="qualified" targetNamespace="" xmlns:xs="">
   <xs:element name="anyType" nillable="true" type="xs:anyType" />
   <xs:element name="anyURI" nillable="true" type="xs:anyURI" />
   <xs:element name="base64Binary" nillable="true" type="xs:base64Binary" />
   <xs:element name="boolean" nillable="true" type="xs:boolean" />
   <xs:element name="byte" nillable="true" type="xs:byte" />
   <xs:element name="dateTime" nillable="true" type="xs:dateTime" />
   <xs:element name="decimal" nillable="true" type="xs:decimal" />
   <xs:element name="double" nillable="true" type="xs:double" />
   <xs:element name="float" nillable="true" type="xs:float" />
   <xs:element name="int" nillable="true" type="xs:int" />
   <xs:element name="long" nillable="true" type="xs:long" />
   <xs:element name="QName" nillable="true" type="xs:QName" />
   <xs:element name="short" nillable="true" type="xs:short" />
   <xs:element name="string" nillable="true" type="xs:string" />
   <xs:element name="unsignedByte" nillable="true" type="xs:unsignedByte" />
   <xs:element name="unsignedInt" nillable="true" type="xs:unsignedInt" />
   <xs:element name="unsignedLong" nillable="true" type="xs:unsignedLong" />
   <xs:element name="unsignedShort" nillable="true" type="xs:unsignedShort" />
   <xs:element name="char" nillable="true" type="tns:char" />
   <xs:simpleType name="char">
     <xs:restriction base="xs:int" />
   <xs:element name="duration" nillable="true" type="tns:duration" />
   <xs:simpleType name="duration">
     <xs:restriction base="xs:duration">
       <xs:pattern value="\-?P(\d*D)?(T(\d*H)?(\d*M)?(\d*(\.\d*)?S)?)?" />
       <xs:minInclusive value="-P10675199DT2H48M5.4775808S" />
       <xs:maxInclusive value="P10675199DT2H48M5.4775807S" />
   <xs:element name="guid" nillable="true" type="tns:guid" />
   <xs:simpleType name="guid">
     <xs:restriction base="xs:string">
       <xs:pattern value="[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}" />
   <xs:attribute name="FactoryType" type="xs:QName" />
   <xs:attribute name="Id" type="xs:ID" />
   <xs:attribute name="Ref" type="xs:IDREF" />

Python Scripts to translate using Microsoft

#install additional libraries to add coloured text to output
!pip install termcolor
!pip install bs4
from termcolor import colored
from bs4 import BeautifulSoup
import requests
#Using Python for Text Translation with Microsoft Cognitive Services
# Specify the subscription Key
#Specify URLs for Cognitive Services - Translator Text API
translateUrl = ''
cognitiveServiceUrl = ''
# Request Access Token
requestHeader = {'Ocp-Apim-Subscription-Key': subscriptionKey}
responseResult =, headers=requestHeader)
token = responseResult.text
print ("Access Token")
print (token)
# Original Text
text = "Créez des applications intelligentes et stratégiques avec une plateforme de base de données évolutive et hybride qui intègre tout ce qu'il vous faut : performances in-memory et sécurité avancée pour les analyses au sein de la base de données."
# Specify source and target language
srcLanguage = "fr"
targetLanguage = "en"
# Define Parameters
params = {'appid': 'Bearer '+token, 'text': text, 'from': srcLanguage, 'to': targetLanguage}
requestHeader = {'Accept': 'application/xml'}
# Invoke Cognitive Services to perform translation
responseResult = requests.get(translateUrl, params=params, headers=requestHeader )
# Show original and target text
print(colored('Original Text\n', 'green'))
print ("\n")
print(colored('Translated Text\n', 'blue'))
soup = BeautifulSoup(responseResult.text,"lxml")
print(colored(soup.get_text(), 'blue'))
Or, even easier:
# -*- coding: utf-8 -*-

import http.client, urllib.parse

# *** Update or verify the following values. ***

# Replace the subscriptionKey string value with your valid subscription key.
subscriptionKey = 'ENTER KEY HERE'

host = ''
path = '/V2/Http.svc/Translate'

target = 'fr-fr'
text = 'Hello'

params = '?to=' + target + '&text=' + urllib.parse.quote (text)

def get_suggestions ():

    headers = {'Ocp-Apim-Subscription-Key': subscriptionKey}
    conn = http.client.HTTPSConnection(host)
    conn.request ("GET", path + params, None, headers)
    response = conn.getresponse ()
    return ()

result = get_suggestions ()
print (result.decode("utf-8"))

Translate response
A successful response is returned in XML, as shown in the following example:
<string xmlns="">Salut</string>

Microsoft Translator Customization Features

Until a few years ago, automatic translation solutions only offered two approaches when it came to translating your content — use a default translation engine that powers major translation sites and apps such as, or build your own customized system painfully from scratch.
In 2012, Microsoft Translator broke this inflexible model with the launch of the Microsoft Translator Hub. This is just one instance of a broader class of work Microsoft is pursuing around artificial intelligence, and our vision for more personal computing experiences and enhanced productivity aided by systems that increasingly can see, hear, speak, understand and even begin to reason. The Hub allowed users to create as many custom systems as needed by combining Microsoft's enormous translation corpus with their own previously translated documents, such as internal or external websites, brochures, white papers, etc.
There are 4 general levels of customization now available to Microsoft Translator API users, with corresponding increases in resource investment and translation quality.

  1. New: Use a Standard Category instead of the default one - Our new standard categories allow you to easily customize the context of your translation by narrowing the scope of the statistical analysis that Microsoft Translator uses to translate your text. Simply speaking, with standard categories, you can tell Microsoft Translator what type of content is being translated in order to improve its accuracy. The first two standard categories we are announcing today are "tech" and "speech", with more on the way.
    • The "tech" category will improve translation quality on all computer-related content (software, hardware, networking...) and has been built with the vast amount of data collected over the years within Microsoft as we translated product help files, documentation, and customer support for our users, and from other sources such as TAUS. The list of languages for which the tech category is supported can be found here.
    • The "speech" category was developed in the last 18 months as we built Skype Translator. For Skype Translator to work properly, it was critical to be able to translate spoken text, which in most cases can be very different from the written text. The languages that are supported in this category are the same speech translation languages that are available for Skype Translator and Microsoft Translator apps for iOS and Android. As new speech languages are released for these applications, the equivalent "speech" category will become available for text translation in our core Translator API as well.
    It's easy to start using standard categories in your translations — just set the value to "tech" or "speech" for the "category" parameter of your translation method if you are using the API, or in the Category ID box in any of our supported products, such as the Document Translator. The default value "general", can be omitted — just select your new standard category to begin receiving your customized translations. These neural network models are available for all speech languages through the Microsoft Translator Speech API, on the try and compare site, and through the text API by using the ‘generalnn’ category ID.
    In addition to standard categories, we also developed a "social media" filter that we can enable server-side upon demand. This Client ID level filter has been developed to convert texts and instant messages to proper English to improve translations quality. For instance, once passed through the filter, "R u here?" would become "Are you here?" — which will obviously translate much better than the original. Please note that, for now, only an English texting filter exists.
  2. New: Upload a Custom Dictionary - You can customize your translations further with dictionaries. Dictionaries allow you to make your own foreign language word lists so that the terminology that is unique to your business or industry will translate just the way you want. For instance, if you have a product name that you want translated in a certain way in French, (or not translated at all, if it's a brand name) just add the product name and the corresponding French translation to your Hub dictionary. Every time you use the Microsoft Translator API with the custom category ID obtained from the Translator Hub, you will get your customized translation. To get your translations up and running, all you need to do is upload a simple Excel spreadsheet with your word list to the Translator Hub website and train the system. You can start with as little as one dictionary entry. The custom category you create with your dictionary can be built on top of the general or the standard (speech or tech) categories and remains valid even when you customize your system with one of the following options.
  3. New: Train a System with 1,000 - 5,000 Parallel Sentences - The third level of customization is to add pre-translated content to your custom category. Today, we are introducing the ability to train a system with as few as 1,000 parallel sentences (pre-translated sentences in the original and target language). By training a system with parallel sentences, you can go beyond just a simple list of translated words and phrases. Instead, the Hub tunes all of its internal parameters to produce translations that are similar to the test sentences you provided.By providing the Hub with at least 1,000 parallel sentences, you can help the Hub choose translations that match your organization's terminology and tone better than the standard categories. If you have created content in another language, such as webpages or documentation, you can use it to improve your translations. Obviously, the more sentences you have, the better the translations. You can use this customization mechanism alone or in combination with a custom dictionary.
  4. Train a system with more than 5,000 Parallel Sentences - As was possible since the Hub launched, but now starting with only 5,000 sentences rather than 10,000 previously, you can use any amount of parallel sentences above 5,000 to customize your translations. With more than 5,000 parallel sentences you can begin to create a system that is learning new terms and phrases in the right context and tone of your business. This leads to a better, more customized translation. Add a dictionary for even better results if you have a corpus of less than 50,000 parallel sentences.If you have more than 50,000 parallel sentences, you will be able to build a system that can give fully customized results. At this level, the machine has learned your terminology in context through parallel sentences, so the dictionary will be less helpful, and can be reduced to the new terms as you develop new topics in your source content.
With more than 50,000 parallel sentences, ideally in the 100s of thousands of sentences, the Hub enables you to create brand new language systems. Many of the Microsoft Translator supported languages were developed by Community Partners including the languages Hmong Daw, Yucatec Maya, Queretaro Otomi, Welsh, and Kiswahili.
Once you have trained and deployed your new customized system, it is available to use in all category ID-enabled Microsoft Translator products, such as the on-premise version of SharePoint, Office apps for PowerPoint and Word, the Document Translator, and the Multilingual App Toolkit, and many translation memory tools from our partners. The Hub can help improve translation quality for a wide variety of scenarios such as web localization, customer support, and internal communications, whether online or in apps.
After your translated content is published, you can engage your community of users to refine the translation by using the Collaborative Translation Framework (CTF). CTF allows you to use human translation to edit the output of the translated content or to manage crowdsourced edits to your content so that you can refine it over time. The Hub can import these human corrections easily so you can incorporate them in training a better-customized translation system.
To start using the Translator Hub to customize your system, simply visit, and register a workspace. You can invite as many other people as you like into your workspace to collaborate on improving your translation system. When you are ready to deploy a custom system, you will need to sign up for an account with Microsoft Translator. You can register for a free 2 million character per month subscription to get you started. After you have registered, you can go to the Translator Hub website to start customizing!

Tikal - list of available filter configurations

The command:
tikal -listconf
lists all the filter configurations currently available.
List of all filter configurations available:

- okf_odf = XML OpenDocument files (e.g. use inside documents).
- okf_mosestext = Default Moses Text configuration.
- okf_tradosrtf = Configuration for Trados-tagged RTF files - READING ONLY.
- okf_rainbowkit = Configuration for Rainbow translation kit.
- okf_rainbowkit-package = Configuration for Rainbow translation kit package.
- okf_rainbowkit-noprompt = Configuration for Rainbow translation kit (without prompt).
- okf_mif = Adobe FrameMaker MIF documents
- okf_archive = Configuration for archive files
- okf_transifex = Transifex project with prompt when starting
- okf_transifex-noPrompt = Transifex project without prompt when starting
- okf_xini = Configuration for XINI documents from ONTRAM
- okf_xini-noOutputSegmentation = Configuration for XINI documents from ONTRAM (fields in the outpu
t are not segmented)
- okf_itshtml5 = Configuration for standard HTML5 documents.
- okf_txml = Wordfast Pro TXML documents
- okf_txml-fillEmptyTargets = Wordfast Pro TXML documents with empty targets filled on output.
- okf_wiki = Text with wiki-style markup
- okf_doxygen = Doxygen-commented Text Documents
- okf_transtable = Default TransTable configuration.
- okf_simplification = Configuration for extracting resources from an XML file. Resources and then
codes are simplified.
- okf_simplification-xmlResources = Configuration for extracting resources from an XML file. Resour
ces are simplified.
- okf_simplification-xmlCodes = Configuration for extracting resources from an XML file. Codes are
- okf_xliff2 = Configuration for XLIFF-2 documents.
- okf_icml = Adobe InDesign ICML documents
- okf_markdown = Markdown files
- okf_pdf = Configuration for PDF documents
- okf_sdlpackage = SDL Trados 2017 SDLPPX and SDLRPX files
- okf_table = Table-like files such as tab-delimited, CSV, fixed-width columns, etc.
- okf_table_csv = Comma-separated values, optional header with field names.
- okf_table_catkeys = Haiku CatKeys resource files
- okf_table_src-tab-trg = 2-column (source + target), tab separated files.
- okf_table_fwc = Fixed-width columns table padded with white-spaces.
- okf_table_tsv = Columns, separated by one or more tabs.
- okf_plaintext = Plain text files.
- okf_plaintext_trim_trail = Text files; trailing spaces and tabs removed from extracted lines.
- okf_plaintext_trim_all = Text files; leading and trailing spaces and tabs removed from extracted
- okf_plaintext_paragraphs = Text files extracted by paragraphs (separated by 1 or more empty lines
- okf_plaintext_spliced_backslash = Spliced lines filter with the backslash character (\) used as t
he splicer.
- okf_plaintext_spliced_underscore = Spliced lines filter with the underscore character (_) used as
the splicer.
- okf_plaintext_spliced_custom = Spliced lines filter with a user-defined splicer.
- okf_plaintext_regex_lines = Plain Text Filter using regex-based linebreak search. Extracts by lin
- okf_plaintext_regex_paragraphs = Plain Text Filter using regex-based linebreak search. Extracts b
y paragraphs.
- okf_xml = Configuration for generic XML documents (default ITS rules).
- okf_xml-resx = Configuration for Microsoft RESX documents (without binary data).
- okf_xml-MozillaRDF = Configuration for Mozilla RDF documents.
- okf_xml-JavaProperties = Configuration for Java Properties files in XML.
- okf_xml-AndroidStrings = Configuration for Android Strings XML documents.
- okf_xml-WixLocalization = Configuration for WiX (Windows Installer XML) Localization files.
- okf_xml-AppleStringsdict = Configuration for Apple Stringsdict files
- okf_html = HTML or XHTML documents
- okf_html-wellFormed = XHTML and well-formed HTML documents
- okf_tmx = Configuration for Translation Memory eXchange (TMX) documents.
- okf_dtd = Configuration for XML DTD documents (entities content)
- okf_json = Configuration for JSON files
- okf_idml = Adobe InDesign IDML documents
- okf_ttx = Configuration for Trados TTX documents.
- okf_properties = Java properties files (Output used \uHHHH escapes)
- okf_properties-outputNotEscaped = Java properties files (Characters in the output encoding are no
t escaped)
- okf_properties-skypeLang = Skype language properties files (including support for HTML codes)
- okf_properties-html-subfilter = Java Property content processed by an HTML subfilter
- okf_phpcontent = Default PHP Content configuration.
- okf_openoffice = ODT, ODS, ODP, ODG, OTT, OTS, OTP, OTG documents
- okf_vignette = Default Vignette Export/Import Content configuration.
- okf_vignette-nocdata = Vignette files without CDATA sections.
- okf_openxml = Microsoft Office documents (DOCX, DOCM, DOTX, DOTM, PPTX, PPTM, PPSX, PPSM, POTX, P
- okf_pensieve = Configuration for Pensieve translation memories.
- okf_xliff = Configuration for XML Localisation Interchange File Format (XLIFF) documents.
- okf_xliff-sdl = Configuration for SDL XLIFF documents. Supports SDL specific metadata
- okf_ts = Configuration for Qt TS files.
- okf_regex = Default Regex configuration.
- okf_regex-srt = Configuration for SRT (Sub-Rip Text) sub-titles files.
- okf_regex-textLine = Configuration for text files where each line is a text unit
- okf_regex-textBlock = Configuration for text files where text units are separated by 2 or more li
- okf_regex-macStrings = Configuration for Macintosh .strings files.
- okf_po = Standard bilingual PO files
- okf_po-monolingual = Monolingual PO files (msgid is a real ID, not the source text).
- okf_yaml = YAML files
- okf_xmlstream = Large XML Documents
- okf_xmlstream-dita = DITA XML
- okf_xmlstream-JavaPropertiesHTML = Java Properties XML with Embedded HTML
- okf_versifiedtxt = Versified Text Document

Thursday, January 11, 2018

AHK Script to translate segment with Google Translate in Across

SendMode Input
Send ^a
Sleep, 500
Send ^c
ClipWait, 2
google_url:="" clipboard
Run %google_url%
Sleep, 2000
MouseMove 966, 297 ; replace with default mouse position for the window
Send ^c
Sleep, 500
Send ^w ; close tab in Firefox
Sleep, 500
StringCaseSense On
StringReplace, clipboard, clipboard, ș, ş, All ; replace new with old diacritics, if needed
StringReplace, clipboard, clipboard, Ș, Ş, All
StringReplace, clipboard, clipboard, ț, ţ, All
StringReplace, clipboard, clipboard, Ț, Ţ, All
WinActivate Across Translator Premium Edition v6.3 7557_en ; replace with current Across version
Sleep, 500
Send ^v

Thursday, January 4, 2018

Deepl - Command Line Language Translator Tool for Linux

DeepL is a command line tool which delivers text translation capabilities to your console. DeepL Translator is developed by German tech company DeepL. It is available to everyone, free of charge on
DeppL translator is based on very advanced neural machine translation that delivers translations of unmatched quality. When users enter a text, DeepL’s artificial intelligence is able to capture even the slightest nuances and reproduce them in translation, unlike any other service.
We will install DeepL translator command line tool and will understand how it works in this article. The underlying logic of this command line tool is API calls it makes to their main website ( So Whenever we try to translate something, It will send the request to the main website and get the results back. So, Your server or machine must have active internet connection for this tool to work. Translate Shell is another tool which does same function. Let's start with the installation.

Installation of Deepl Translator command line tool

Step 1: Before doing the installation DeepL, We need to install nodejs version >6.0. It prerequisite for DeepL translator tool. By default, Linux distributions don't come with node PPA configured. We will configure PPA first and then install nodejs 6.0. Ignore this step if you have nodejs already installed with 6.0 or higher version. Create a file /etc/apt/sources.list.d/nodesource.list and add content as shown below.
$ vi /etc/apt/sources.list.d/nodesource.list
deb xenial main
deb-src xenial main
Execute below step to install nodejs
$ curl -s | sudo apt-key add -
$ apt-get update
$ apt-get install nodejs
Step 2: Install Yarn package dependency manager if it is not installed. Execute below commands to install yarn
$ curl -sS | sudo apt-key add -
$ echo "deb stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
$ sudo apt-get update
$ sudo apt-get install yarn
Step 3: Finally, Execute below command to install Deepl translator on your machine.
$ yarn global add deepl-translator-cli
You can check installation status by checking the version of Deepl. Execute below command to check the version of Deepl installed.
$ deepl --version
That's it. We have successfully installed deepl translator. Now let's explore it.

Usage of Deepl Translator command line tool

Deepl translator supports below language at the time of writing this.
  • English (EN)
  • German (DE)
  • French (FR)
  • Spanish (ES)
  • Italian (IT)
  • Dutch (NL)
  • Polish (PL)
The neural networks are already training to master more languages like Mandarin, Japanese, and Russian.
Apart from translation, Deepl has capabilities to detect the input language too. So, Basically, Deepl works in two modes: one is translate and other is detect
  • Translation
    1. To translate the sentence or word use below syntax
$ deepl translate -t 'FR' "Hey, What's going on?"
Hé, qu'est-ce qui se passe?
Here, FR is ISO code for the french language, Deepl has given the output in the French language. Parameter Breakup of above translate command is shown below.
deepl translate -t '${TARGET_LANGUAGE_ISO_CODE}' '${INPUT STRING}'
  • Detection
    1. You can detect the language of specific sentence as show below using deepl translator.
$ deepl detect "Batman può essere chiunque"
Italian (IT)
Here, Deepl has detected the input sentence as italian. Parameter Breakup of above detect command is shown below.
deepl detect '${INPUT STRING}'
You can execute below command to get help from command line
$ deepl -help

Wednesday, December 27, 2017

Open source translation tools to localize your project

Localization plays a central role in the ability to customize an open source project to suit the needs of users around the world. Besides coding, language translation is one of the main ways people around the world contribute to and engage with open source projects.
There are tools specific to the language services industry (surprised to hear that's a thing?) that enable a smooth localization process with a high level of quality. Categories that localization tools fall into include:
  • Computer-assisted translation (CAT) tools
  • Machine translation (MT) engines
  • Translation management systems (TMS)
  • Terminology management tools
  • Localization automation tools
The proprietary versions of these tools can be quite expensive. A single license for SDL Trados Studio (the leading CAT tool) can cost thousands of euros, and even then it is only useful for one individual and the customizations are limited (and psst, they cost more, too). Open source projects looking to localize into many languages and streamline their localization processes will want to look at open source tools to save money and get the flexibility they need with customization. I've compiled this high-level survey of many of the open source localization tool projects out there to help you decide what to use.

Computer-assisted translation (CAT) tools


OmegaT CAT tool
OmegaT CAT tool. Here you see the translation memory (Fuzzy Matches) and terminology recall (Glossary) features at work. OmegaT is licensed under the GNU Public License version 3+.
CAT tools are a staple of the language services industry. As the name implies, CAT tools help translators perform the tasks of translation, bilingual review, and monolingual review as quickly as possible and with the highest possible consistency through reuse of translated content (also known as translation memory). Translation memory and terminology recall are two central features of CAT tools. They enable a translator to reuse previously translated content from old projects in new projects. This allows them to translate a high volume of words in a shorter amount of time while maintaining a high level of quality through terminology and style consistency. This is especially handy for localization, as text in a lot of software and web UIs is often the same across platforms and applications. CAT tools are standalone pieces of software though, requiring translators that use them to work locally and merge to a central repository.
Tools to check out:

Machine translation (MT) engines

MT engines automate the transfer of text from one language to another. MT is broken up into three primary methodologies: rules-based, statistical, and neural (which is the new player). The most widespread MT methodology is statistical, which (in very brief terms) draws conclusions about the interconnectedness of a pair of languages by running statistical analyses over annotated bilingual corpus data using n-gram models. When a new source language phrase is introduced to the engine for translation, it looks within its analyzed corpus data to find statistically relevant equivalents, which it produces in the target language. MT can be useful as a productivity aid to translators, changing their primary task from translating a source text to a target text to post-editing the MT engine's target language output. I don't recommend using raw MT output in localizations, but if your community is trained in the art of post-editing, MT can be a useful tool to help them make large volumes of contributions.
Tools to check out:

Translation management systems (TMS)


Mozilla's Pontoon translation management system user interface
Mozilla's Pontoon translation management system user interface. With WYSIWYG editing, you can translate content in context and simultaneously perform translation and quality assurance. Pontoon is licensed under the BSD 3-clause New or Revised License.
TMS tools are web-based platforms that allow you to manage a localization project and enable translators and reviewers to do what they do best. Most TMS tools aim to automate many manual parts of the localization process by including version control system (VCS) integrations, cloud services integrations, project reporting, as well as the standard translation memory and terminology recall features. These tools are most amenable to community localization or translation projects, as they allow large groups of translators and reviewers to contribute to a project. Some also use a WYSIWYG editor to give translators context for their translations. This added context improves translation accuracy and cuts down on the amount of time a translator has to wait between doing the translation and reviewing the translation within the user interface.
Tools to check out

Terminology management tools


Brigham Young University's BaseTerm tool
Brigham Young University's BaseTerm tool displays the new-term entry dialogue window. BaseTerm is licensed under the Eclipse Public License.
Terminology management tools give you a GUI to create terminology resources (known as termbases) to add context and ensure translation consistency. These resources are consumed by CAT tools and TMS platforms to aid translators in the process of translation. For languages in which a term could be either a noun or a verb based on the context, terminology management tools allows you to add metadata for a term that labels its gender, part of speech, monolingual definition, as well as context clues. Terminology management is often an underserved, but no less important, part of the localization process. In both the open source and proprietary ecosystems, there are only a small handful of options available.
Tools to check out

Localization automation tools


Ratel and Rainbow components of the Okapi Framework
The Ratel and Rainbow components of the Okapi Framework. Photo courtesy of the Okapi Framework. The Okapi Framework is licensed under the Apache License version 2.0.
Localization automation tools facilitate the way you process localization data. This can include text extraction, file format conversion, tokenization, VCS synchronization, term extraction, pre-translation, and various quality checks over common localization standard file formats. In some tool suites, like the Okapi Framework, you can create automation pipelines for performing various localization tasks. This can be very useful for a variety of situations, but their main utility is in the time they save by automating many tasks. They can also move you closer to a more continuous localization process.
Tools to check out

Change default code page of Windows console to UTF-8

Running chcp 65001 in the command prompt prior to use of any tools helps but is there any way to set is as default code page?

Changing HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\OEMCP value to 65001 appear to make the system unable to boot in my case.
Proposed change of HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor\Autorun to @chcp 65001>nul

Batch file .bat

REM change CHCP to UTF-8
CHCP 65001
Saved at C:\Windows\System32 as switch.bat. Create a link for cmd.exe on the desktop.
In the properties of cmd, changed the destination to: C:\Windows\System32\cmd.exe /k switch

Note that it will print Active code page: 65001 to stdout. So if you are doing something like CHCP 65001 && mycommand.exe then you'll get the codepage printed out at the start. You need to CHCP 65001 >nul && mycommand.exe

Reg file:
Windows Registry Editor Version 5.00
  1. Value must be in hex
  2. Top line must be included exactly as is
  3. HKEY_CURRENT_USER cannot be abbreviated
  4. dword cannot be omitted

Command Prompt:
REG ADD HKCU\Console\%SystemRoot^%_system32_cmd.exe /v CodePage /t REG_DWORD /d 65001
  1. Value can be in dec or hex
  2. %SystemRoot% must be escaped
  3. REG_DWORD cannot be omitted

New-Item -ErrorAction Ignore HKCU:\Console\%SystemRoot%_system32_cmd.exe
Set-ItemProperty HKCU:\Console\%SystemRoot%_system32_cmd.exe CodePage 65001
  1. Value can be in dec or hex
  2. -Type DWord is assumed with PowerShell 3+
  3. Can use ni -> New-Item
  4. Can use sp -> Set-ItemProperty
  5. Can use -ea 0 -> -ErrorAction Ignore

regtool add '\HKEY_CURRENT_USER\Console\%SystemRoot%_system32_cmd.exe'
regtool set '\HKEY_CURRENT_USER\Console\%SystemRoot%_system32_cmd.exe\CodePage' 65001
  1. Value can be in dec or hex
  2. Can use / -> \
  4. Can use user -> HKEY_CURRENT_USER