Thursday, January 25, 2018

Python for regex search and replace

# import the needed modules (re is for regex)
import os, re

# set the working directory for a shortcut
os.chdir('D:/test')

# open the source file and read it
fh = file('file.txt', 'r')
subject = fh.read()
fh.close()

# create the pattern object. r means the string is send as raw so we don't have to escape our escape characters
pattern = re.compile(r'\(([0-9])*,')
# do the replace
result = pattern.sub("('',", subject)

# write the file
f_out = file('file.txt', 'w')
f_out.write(result)
f_out.close()

Python re.sub Examples

See also Python re.match
Example for re.sub() usage in Python

Syntax

import re result = re.sub(pattern, repl, string, count=0, flags=0);

Simple Examples

num = re.sub(r'abc', '', input) # Delete pattern abc num = re.sub(r'abc', 'def', input) # Replace pattern abc -> def num = re.sub(r'\s+', '\s', input) # Eliminate duplicate whitespaces num = re.sub(r'abc(def)ghi', '\1', input) # Replace a string with a part of itself
 

Python re.match Examples

See also Python re.sub
Note that re.match() matches from the start of the string. Use re.search() when you want to match anywhere in a string.
  • Use re.search() if you want to search anywhere inside a string
  • Use re.sub() if you want to substitute substrings.
  • Use re.split() if you want to extract fields when you have a common field separator.

Syntax

Ad-hoc match

import re result = re.match(pattern, string, flags=0);

Pre-compiled pattern

Use this if you use a pattern multiple times. import re pattern = re.compile('some pattern') result = pattern.match(string [, pos [, end]]);

Simple Examples

result = re.match(r'abc', input) # Check for substring 'abc' result = re.match(r'^\w+$', input) # Ensure string is one word pattern = re.compile('abc') # Same as first example result = pattern.match(input)
 

Choose between neural and statistical Google translation model

The premium features of the Google Cloud Translation API have been made generally available in the Standard Edition. All users have access to the robust translation features available using the Neural Machine Translation (NMT) model, as well as the capabilities of the Phrase-Based Machine Translation (PBMT) model. There is no difference in pricing between the standard, PBMT model, and the NMT model.
By default, when you make a translation request to the Google Cloud Translation API, your text is translated using the NMT model. If the NMT model is not supported for the requested language translation pair, or if you explicitly request it, the PBMT model is used.
You can specify which model to use for translation by using the model query parameter. Specify base to use the PBMT model, and nmt to use the NMT model.

Friday, January 19, 2018

HTML Codes for Romanian Language Characters

Even if your site is written in English only and does not include multilingual translations, you may need to add Romanian language characters to that site on certain pages or for certain words. The list below includes the HTML codes necessary to use Romanian characters that are not in the standard character set and are not found on a keyboard's keys.
Not all browsers support all these codes (mainly, older browsers may cause problems - newer browsers should be fine), so be sure to test your HTML codes before you use them.
Some Romanian characters may be part of the Unicode character set, so you need to declare that in the head of your documents:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
Here are the different characters you may need to use.
DisplayFriendly CodeNumerical CodeHex CodeDescription
Ă &#258;&#x102;Capital A-breve
ă &#259;&#x103;Lowercase a-breve
 &Acirc;&#194;&#xC2;Capital A-circumflex
â &acirc;&#226;&#xE2;Lowercase a-circumflex
Π&Icirc;&#206;&#xCE;Capital I-circumflex
î &icirc;&#238;&#xEE;Lowercase i-circumflex
Ș &#218;&#xDA;Capital S-comma
ș &#219;&#xDB;Lowercase s-comma
Ş &#350;&#x15E;Capital S-cedilla
ş &#351;&#x15F;Lowercase s-cedilla
Ț &#538;&#x21A;Capital T-comma
ț &#539;&#x21B;Lowercase t-comma
Ţ &#354;&#x162;Capital T-cedilla
ţ &#355;&#x163;Lowercase t-cedilla
Using these characters is simple. In the HTML markup, you would place these special character codes where you want the Romanian character to appear.
These are used similarly to other HTML special character codes that allow you to add characters that are also not found on the traditional keyboard, and therefore cannot be simply typed into the HTML in order to display on a web page.
Remember, these characters codes may be used on an English language website if you need to display a word with one of these characters.
 These characters would also be used in HTML that was actually displaying full Romanian translations, whether you actually coded those pages by hand and had a full Romanian version of the site, or if you used a more automated approach to multilingual webpages and went with a solution like Google Translate.
Source: https://www.thoughtco.com
Romanian special characters:[ăĂ,âÂ,îÎ,şŞ,ţŢ] or (if your browser can't deal with ISO-8859-2 charset).

Unicode (ro)
ISO-10646
If you want to use the Unicode (UTF-8) charset, be aware that for older browsers you might have to set View->Encoding to UTF-8 or Central european.

Latin-1 SupplementLatin Extended-ALatin Extended-B
&#194226258259206238351350355354537536539538
CharÂâĂăÎţŢșȘțȚ
1. First you need to tell the browser to use unicode. You do that by adding the meta line of code after the opening html tag:
<html>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />

...

2. In your HTML code you would use "&#NNN", where NNN is the number in the middle row in the above table.
For reference the special chars were retrieved from these UNICODE charts:
3. Text example using UTF-8 Romanian characters:
  • Tudor Arghezi -- O zi
4. Perl script to convert Romanian texts from ISO-8859-2 to UTF-8: roconvert.pl
ISO-8859-20. If you can't see the ISO-8859-2 charset correctly, you could upgrade your browser to either the latest Internet Explorer, Netscape, Opera, or Mozilla. Lynx 2.8x (although not supporting Javascript) seems to do at least a decent job at approximating those non-ASCII characters...
1. Add this "meta" tag right after your oppening "html" tag in order to tell the viewer/browser to use the proper charset:

<meta http-equiv="content-type" content="text/html; charset=iso-8859-2">

2. ş does not have a cedilla underneath but a comma (ISO-8859-2 has the Turkish version of "sh"). See Unicode for the correct glyphs.
3. Text examples using ISO-8859-2 Romanian characters:

Romanian Keyboard Control using your browser

Romanian Keyboard Control in Mac OS/X

Open System Preferences and select International icom under the Personal category. Select the Input Menu and scroll down to Romanian. Select it and the "Show input menu in menu bar" checkbox below in order to be able to switch between different keyboard layouts. Close System Preferences. Now, you're able to switch to/from the Romanian keyboard layout using the top menu bar. Typically the special characters map as illustrated below:
US[{]}\|;:'"
ROăĂîÎâÂşŞţŢ

Romanian Keyboard Control in Windows

Control Panel->Keyboard ->Language (95/98) ->Add...->Romanian->OK

->Input Locales (NT/XP/2000)
If you also check the Enable indicator on taskbar box on the same Input Locales property page, you'll be able to easily switch between RO and EN by left-click on symbol on the taskbar and choosing the desired keyboard mapping.
This setting allows the following mapping of Romanian characters on a standard QWERTY 101-AT US keyboard:
US[{]}\|;:'"
ROăĂîÎâÂşŞţŢ
Pronounciation guide - as close as I can describe it in plain English
  • The pronunciation of ă is similar to the final vowel in the word "mother".
  • â and î are pronounced the same. It's similar to the Russian . Also, î is used only if the first letter in a word, and â only if it's not the first letter in a word. Here is a lame sound clip of me saying "Câmpina cânt" (it sounds like KHM-PE-NAH KHNT).
  • The last 2 special characters(ş and ţ) displayed are pronounced as "sh" and "ts" in English. 
Source: http://www.marinel.net

The below list includes HTML character codes in Romanian which are usually not found in the standard character set. Should you wish to produce any of the unique characters below from the ASCII Romanian Library, use the respective codes when writing your HTML code.
In order to use these characters, simply place the unique character code where you'd like the character to appear. Codes as such play a key part in translation projects.
Ă
&#258;
&#x102;

Capital A-breve
ă
&#259;
&#x103;

Lowercase a-breve
Â
&#194;
&#xc2;
&Acirc;
Capital A-circumflex
â
&#226;
&#xe2;
&acirc;
Lowercase a-circumflex
Î
&#206;
&#xce;
&Icirc;
Capital I-circumflex
î
&#238;
&#xee;
&icirc;
Lowercase i-circumflex
Ș
&#x218;


Capital S-comma
ș
&#x219;


Lowercase s-comma
Ş
&#350;
&#x15e;

Capital S-cedilla
ş
&#351;
&#x15f;

Lowercase s-cedilla
Ț
&#538;
&#x21a;

Capital T-comma
ț
&#539;
&#x21b;

Lowercase t-comma
Ţ
&#354;
&#x162;

Capital T-cedilla
ţ
&#355;
&#x163;

Lowercase t-cedilla

Monday, January 15, 2018

Batch find and replace non-breaking spaces


About the non-breakable space, you can use this string in the Find what box:
(\d)[followed by a space]

and this string in the Replace with box:
$1[followed by non-breakable space, i.e., ALT+0160]

Note: Texts in brackets and square brackets themselves ([])are just descriptive. You have to remove them and replace them with spaces that I can't show here.

Tick Use and select Regular expressions.

The press the Replace button for each instance with wish to get replaced.

Of course, you can try with other search and replace patterns, like:
Find what: (\d) (mg)
Replace with: $1[ALT+0160]$2



Non-breaking hyphen:
Alt +2011

Saturday, January 13, 2018

Microsoft Serialization.xsd Schema for API response

<?xml version="1.0" encoding="utf-8"?>
 <xs:schema xmlns:tns="http://schemas.microsoft.com/2003/10/Serialization/" attributeFormDefault="qualified" elementFormDefault="qualified" targetNamespace="http://schemas.microsoft.com/2003/10/Serialization/" xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="anyType" nillable="true" type="xs:anyType" />
   <xs:element name="anyURI" nillable="true" type="xs:anyURI" />
   <xs:element name="base64Binary" nillable="true" type="xs:base64Binary" />
   <xs:element name="boolean" nillable="true" type="xs:boolean" />
   <xs:element name="byte" nillable="true" type="xs:byte" />
   <xs:element name="dateTime" nillable="true" type="xs:dateTime" />
   <xs:element name="decimal" nillable="true" type="xs:decimal" />
   <xs:element name="double" nillable="true" type="xs:double" />
   <xs:element name="float" nillable="true" type="xs:float" />
   <xs:element name="int" nillable="true" type="xs:int" />
   <xs:element name="long" nillable="true" type="xs:long" />
   <xs:element name="QName" nillable="true" type="xs:QName" />
   <xs:element name="short" nillable="true" type="xs:short" />
   <xs:element name="string" nillable="true" type="xs:string" />
   <xs:element name="unsignedByte" nillable="true" type="xs:unsignedByte" />
   <xs:element name="unsignedInt" nillable="true" type="xs:unsignedInt" />
   <xs:element name="unsignedLong" nillable="true" type="xs:unsignedLong" />
   <xs:element name="unsignedShort" nillable="true" type="xs:unsignedShort" />
   <xs:element name="char" nillable="true" type="tns:char" />
   <xs:simpleType name="char">
     <xs:restriction base="xs:int" />
   </xs:simpleType>
   <xs:element name="duration" nillable="true" type="tns:duration" />
   <xs:simpleType name="duration">
     <xs:restriction base="xs:duration">
       <xs:pattern value="\-?P(\d*D)?(T(\d*H)?(\d*M)?(\d*(\.\d*)?S)?)?" />
       <xs:minInclusive value="-P10675199DT2H48M5.4775808S" />
       <xs:maxInclusive value="P10675199DT2H48M5.4775807S" />
     </xs:restriction>
   </xs:simpleType>
   <xs:element name="guid" nillable="true" type="tns:guid" />
   <xs:simpleType name="guid">
     <xs:restriction base="xs:string">
       <xs:pattern value="[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}" />
     </xs:restriction>
   </xs:simpleType>
   <xs:attribute name="FactoryType" type="xs:QName" />
   <xs:attribute name="Id" type="xs:ID" />
   <xs:attribute name="Ref" type="xs:IDREF" />
 </xs:schema>

Python Scripts to translate using Microsoft

#install additional libraries to add coloured text to output
!pip install termcolor
!pip install bs4
from termcolor import colored
from bs4 import BeautifulSoup
import requests
#Using Python for Text Translation with Microsoft Cognitive Services
# Specify the subscription Key
subscriptionKey = "ENTER YOUR COGNITIVE API KEY"
#Specify URLs for Cognitive Services - Translator Text API
translateUrl = 'https://api.microsofttranslator.com/v2/http.svc/Translate'
cognitiveServiceUrl = 'https://api.cognitive.microsoft.com/sts/v1.0/issueToken'
# Request Access Token
requestHeader = {'Ocp-Apim-Subscription-Key': subscriptionKey}
responseResult = requests.post(cognitiveServiceUrl, headers=requestHeader)
token = responseResult.text
print ("Access Token")
print (token)
# Original Text
text = "Créez des applications intelligentes et stratégiques avec une plateforme de base de données évolutive et hybride qui intègre tout ce qu'il vous faut : performances in-memory et sécurité avancée pour les analyses au sein de la base de données."
print(text)
# Specify source and target language
srcLanguage = "fr"
targetLanguage = "en"
# Define Parameters
params = {'appid': 'Bearer '+token, 'text': text, 'from': srcLanguage, 'to': targetLanguage}
requestHeader = {'Accept': 'application/xml'}
# Invoke Cognitive Services to perform translation
responseResult = requests.get(translateUrl, params=params, headers=requestHeader )
# Show original and target text
print(colored('Original Text\n', 'green'))
print(colored(text,'green'))
print ("\n")
print(colored('Translated Text\n', 'blue'))
soup = BeautifulSoup(responseResult.text,"lxml")
print(colored(soup.get_text(), 'blue'))
 
Or, even easier:
 
# -*- coding: utf-8 -*-

import http.client, urllib.parse

# *** Update or verify the following values. ***

# Replace the subscriptionKey string value with your valid subscription key.
subscriptionKey = 'ENTER KEY HERE'

host = 'api.microsofttranslator.com'
path = '/V2/Http.svc/Translate'

target = 'fr-fr'
text = 'Hello'

params = '?to=' + target + '&text=' + urllib.parse.quote (text)

def get_suggestions ():

    headers = {'Ocp-Apim-Subscription-Key': subscriptionKey}
    conn = http.client.HTTPSConnection(host)
    conn.request ("GET", path + params, None, headers)
    response = conn.getresponse ()
    return response.read ()

result = get_suggestions ()
print (result.decode("utf-8"))

Translate response
A successful response is returned in XML, as shown in the following example:
XML
<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">Salut</string>
 

Microsoft Translator Customization Features


Until a few years ago, automatic translation solutions only offered two approaches when it came to translating your content — use a default translation engine that powers major translation sites and apps such as Bing.com/translator, or build your own customized system painfully from scratch.
In 2012, Microsoft Translator broke this inflexible model with the launch of the Microsoft Translator Hub. This is just one instance of a broader class of work Microsoft is pursuing around artificial intelligence, and our vision for more personal computing experiences and enhanced productivity aided by systems that increasingly can see, hear, speak, understand and even begin to reason. The Hub allowed users to create as many custom systems as needed by combining Microsoft's enormous translation corpus with their own previously translated documents, such as internal or external websites, brochures, white papers, etc.
There are 4 general levels of customization now available to Microsoft Translator API users, with corresponding increases in resource investment and translation quality.

  1. New: Use a Standard Category instead of the default one - Our new standard categories allow you to easily customize the context of your translation by narrowing the scope of the statistical analysis that Microsoft Translator uses to translate your text. Simply speaking, with standard categories, you can tell Microsoft Translator what type of content is being translated in order to improve its accuracy. The first two standard categories we are announcing today are "tech" and "speech", with more on the way.
    • The "tech" category will improve translation quality on all computer-related content (software, hardware, networking...) and has been built with the vast amount of data collected over the years within Microsoft as we translated product help files, documentation, and customer support for our users, and from other sources such as TAUS. The list of languages for which the tech category is supported can be found here.
    • The "speech" category was developed in the last 18 months as we built Skype Translator. For Skype Translator to work properly, it was critical to be able to translate spoken text, which in most cases can be very different from the written text. The languages that are supported in this category are the same speech translation languages that are available for Skype Translator and Microsoft Translator apps for iOS and Android. As new speech languages are released for these applications, the equivalent "speech" category will become available for text translation in our core Translator API as well.
    It's easy to start using standard categories in your translations — just set the value to "tech" or "speech" for the "category" parameter of your translation method if you are using the API, or in the Category ID box in any of our supported products, such as the Document Translator. The default value "general", can be omitted — just select your new standard category to begin receiving your customized translations. These neural network models are available for all speech languages through the Microsoft Translator Speech API, on the try and compare site translate.ai, and through the text API by using the ‘generalnn’ category ID.
    In addition to standard categories, we also developed a "social media" filter that we can enable server-side upon demand. This Client ID level filter has been developed to convert texts and instant messages to proper English to improve translations quality. For instance, once passed through the filter, "R u here?" would become "Are you here?" — which will obviously translate much better than the original. Please note that, for now, only an English texting filter exists.
  2. New: Upload a Custom Dictionary - You can customize your translations further with dictionaries. Dictionaries allow you to make your own foreign language word lists so that the terminology that is unique to your business or industry will translate just the way you want. For instance, if you have a product name that you want translated in a certain way in French, (or not translated at all, if it's a brand name) just add the product name and the corresponding French translation to your Hub dictionary. Every time you use the Microsoft Translator API with the custom category ID obtained from the Translator Hub, you will get your customized translation. To get your translations up and running, all you need to do is upload a simple Excel spreadsheet with your word list to the Translator Hub website and train the system. You can start with as little as one dictionary entry. The custom category you create with your dictionary can be built on top of the general or the standard (speech or tech) categories and remains valid even when you customize your system with one of the following options.
  3. New: Train a System with 1,000 - 5,000 Parallel Sentences - The third level of customization is to add pre-translated content to your custom category. Today, we are introducing the ability to train a system with as few as 1,000 parallel sentences (pre-translated sentences in the original and target language). By training a system with parallel sentences, you can go beyond just a simple list of translated words and phrases. Instead, the Hub tunes all of its internal parameters to produce translations that are similar to the test sentences you provided.By providing the Hub with at least 1,000 parallel sentences, you can help the Hub choose translations that match your organization's terminology and tone better than the standard categories. If you have created content in another language, such as webpages or documentation, you can use it to improve your translations. Obviously, the more sentences you have, the better the translations. You can use this customization mechanism alone or in combination with a custom dictionary.
  4. Train a system with more than 5,000 Parallel Sentences - As was possible since the Hub launched, but now starting with only 5,000 sentences rather than 10,000 previously, you can use any amount of parallel sentences above 5,000 to customize your translations. With more than 5,000 parallel sentences you can begin to create a system that is learning new terms and phrases in the right context and tone of your business. This leads to a better, more customized translation. Add a dictionary for even better results if you have a corpus of less than 50,000 parallel sentences.If you have more than 50,000 parallel sentences, you will be able to build a system that can give fully customized results. At this level, the machine has learned your terminology in context through parallel sentences, so the dictionary will be less helpful, and can be reduced to the new terms as you develop new topics in your source content.
With more than 50,000 parallel sentences, ideally in the 100s of thousands of sentences, the Hub enables you to create brand new language systems. Many of the Microsoft Translator supported languages were developed by Community Partners including the languages Hmong Daw, Yucatec Maya, Queretaro Otomi, Welsh, and Kiswahili.
Once you have trained and deployed your new customized system, it is available to use in all category ID-enabled Microsoft Translator products, such as the on-premise version of SharePoint, Office apps for PowerPoint and Word, the Document Translator, and the Multilingual App Toolkit, and many translation memory tools from our partners. The Hub can help improve translation quality for a wide variety of scenarios such as web localization, customer support, and internal communications, whether online or in apps.
After your translated content is published, you can engage your community of users to refine the translation by using the Collaborative Translation Framework (CTF). CTF allows you to use human translation to edit the output of the translated content or to manage crowdsourced edits to your content so that you can refine it over time. The Hub can import these human corrections easily so you can incorporate them in training a better-customized translation system.
To start using the Translator Hub to customize your system, simply visit www.microsoft.com/translator/hub.aspx, and register a workspace. You can invite as many other people as you like into your workspace to collaborate on improving your translation system. When you are ready to deploy a custom system, you will need to sign up for an account with Microsoft Translator. You can register for a free 2 million character per month subscription to get you started. After you have registered, you can go to the Translator Hub website to start customizing!
Source: blogs.msdn.microsoft.com

Tikal - list of available filter configurations

The command:
tikal -listconf
lists all the filter configurations currently available.
List of all filter configurations available:

- okf_odf = XML OpenDocument files (e.g. use inside OpenOffice.org documents).
- okf_mosestext = Default Moses Text configuration.
- okf_tradosrtf = Configuration for Trados-tagged RTF files - READING ONLY.
- okf_rainbowkit = Configuration for Rainbow translation kit.
- okf_rainbowkit-package = Configuration for Rainbow translation kit package.
- okf_rainbowkit-noprompt = Configuration for Rainbow translation kit (without prompt).
- okf_mif = Adobe FrameMaker MIF documents
- okf_archive = Configuration for archive files
- okf_transifex = Transifex project with prompt when starting
- okf_transifex-noPrompt = Transifex project without prompt when starting
- okf_xini = Configuration for XINI documents from ONTRAM
- okf_xini-noOutputSegmentation = Configuration for XINI documents from ONTRAM (fields in the outpu
t are not segmented)
- okf_itshtml5 = Configuration for standard HTML5 documents.
- okf_txml = Wordfast Pro TXML documents
- okf_txml-fillEmptyTargets = Wordfast Pro TXML documents with empty targets filled on output.
- okf_wiki = Text with wiki-style markup
- okf_doxygen = Doxygen-commented Text Documents
- okf_transtable = Default TransTable configuration.
- okf_simplification = Configuration for extracting resources from an XML file. Resources and then
codes are simplified.
- okf_simplification-xmlResources = Configuration for extracting resources from an XML file. Resour
ces are simplified.
- okf_simplification-xmlCodes = Configuration for extracting resources from an XML file. Codes are
simplified.
- okf_xliff2 = Configuration for XLIFF-2 documents.
- okf_icml = Adobe InDesign ICML documents
- okf_markdown = Markdown files
- okf_pdf = Configuration for PDF documents
- okf_sdlpackage = SDL Trados 2017 SDLPPX and SDLRPX files
- okf_table = Table-like files such as tab-delimited, CSV, fixed-width columns, etc.
- okf_table_csv = Comma-separated values, optional header with field names.
- okf_table_catkeys = Haiku CatKeys resource files
- okf_table_src-tab-trg = 2-column (source + target), tab separated files.
- okf_table_fwc = Fixed-width columns table padded with white-spaces.
- okf_table_tsv = Columns, separated by one or more tabs.
- okf_plaintext = Plain text files.
- okf_plaintext_trim_trail = Text files; trailing spaces and tabs removed from extracted lines.
- okf_plaintext_trim_all = Text files; leading and trailing spaces and tabs removed from extracted
lines.
- okf_plaintext_paragraphs = Text files extracted by paragraphs (separated by 1 or more empty lines
).
- okf_plaintext_spliced_backslash = Spliced lines filter with the backslash character (\) used as t
he splicer.
- okf_plaintext_spliced_underscore = Spliced lines filter with the underscore character (_) used as
the splicer.
- okf_plaintext_spliced_custom = Spliced lines filter with a user-defined splicer.
- okf_plaintext_regex_lines = Plain Text Filter using regex-based linebreak search. Extracts by lin
es.
- okf_plaintext_regex_paragraphs = Plain Text Filter using regex-based linebreak search. Extracts b
y paragraphs.
- okf_xml = Configuration for generic XML documents (default ITS rules).
- okf_xml-resx = Configuration for Microsoft RESX documents (without binary data).
- okf_xml-MozillaRDF = Configuration for Mozilla RDF documents.
- okf_xml-JavaProperties = Configuration for Java Properties files in XML.
- okf_xml-AndroidStrings = Configuration for Android Strings XML documents.
- okf_xml-WixLocalization = Configuration for WiX (Windows Installer XML) Localization files.
- okf_xml-AppleStringsdict = Configuration for Apple Stringsdict files
- okf_html = HTML or XHTML documents
- okf_html-wellFormed = XHTML and well-formed HTML documents
- okf_tmx = Configuration for Translation Memory eXchange (TMX) documents.
- okf_dtd = Configuration for XML DTD documents (entities content)
- okf_json = Configuration for JSON files
- okf_idml = Adobe InDesign IDML documents
- okf_ttx = Configuration for Trados TTX documents.
- okf_properties = Java properties files (Output used \uHHHH escapes)
- okf_properties-outputNotEscaped = Java properties files (Characters in the output encoding are no
t escaped)
- okf_properties-skypeLang = Skype language properties files (including support for HTML codes)
- okf_properties-html-subfilter = Java Property content processed by an HTML subfilter
- okf_phpcontent = Default PHP Content configuration.
- okf_openoffice = OpenOffice.org ODT, ODS, ODP, ODG, OTT, OTS, OTP, OTG documents
- okf_vignette = Default Vignette Export/Import Content configuration.
- okf_vignette-nocdata = Vignette files without CDATA sections.
- okf_openxml = Microsoft Office documents (DOCX, DOCM, DOTX, DOTM, PPTX, PPTM, PPSX, PPSM, POTX, P
OTM, XLSX, XLSM, XLTX, XLTM, VSDX, VSDM).
- okf_pensieve = Configuration for Pensieve translation memories.
- okf_xliff = Configuration for XML Localisation Interchange File Format (XLIFF) documents.
- okf_xliff-sdl = Configuration for SDL XLIFF documents. Supports SDL specific metadata
- okf_ts = Configuration for Qt TS files.
- okf_regex = Default Regex configuration.
- okf_regex-srt = Configuration for SRT (Sub-Rip Text) sub-titles files.
- okf_regex-textLine = Configuration for text files where each line is a text unit
- okf_regex-textBlock = Configuration for text files where text units are separated by 2 or more li
ne-breaks.
- okf_regex-macStrings = Configuration for Macintosh .strings files.
- okf_po = Standard bilingual PO files
- okf_po-monolingual = Monolingual PO files (msgid is a real ID, not the source text).
- okf_yaml = YAML files
- okf_xmlstream = Large XML Documents
- okf_xmlstream-dita = DITA XML
- okf_xmlstream-JavaPropertiesHTML = Java Properties XML with Embedded HTML
- okf_versifiedtxt = Versified Text Document

Thursday, January 11, 2018

AHK Script to translate segment with Google Translate in Across

#NoEnv
SendMode Input
^!I::
Send ^a
Sleep, 500
Send ^c
ClipWait, 2
google_url:="http://translate.google.com/#de/ro/" clipboard
Run %google_url%
Sleep, 2000
MouseMove 966, 297 ; replace with default mouse position for the window
Send ^c
Sleep, 500
Send ^w ; close tab in Firefox
Sleep, 500
StringCaseSense On
StringReplace, clipboard, clipboard, ș, ş, All ; replace new with old diacritics, if needed
StringReplace, clipboard, clipboard, Ș, Ş, All
StringReplace, clipboard, clipboard, ț, ţ, All
StringReplace, clipboard, clipboard, Ț, Ţ, All
WinActivate Across Translator Premium Edition v6.3 7557_en ; replace with current Across version
Sleep, 500
Send ^v
return

Thursday, January 4, 2018

Deepl - Command Line Language Translator Tool for Linux

DeepL is a command line tool which delivers text translation capabilities to your console. DeepL Translator is developed by German tech company DeepL. It is available to everyone, free of charge on www.DeepL.com.
DeppL translator is based on very advanced neural machine translation that delivers translations of unmatched quality. When users enter a text, DeepL’s artificial intelligence is able to capture even the slightest nuances and reproduce them in translation, unlike any other service.
We will install DeepL translator command line tool and will understand how it works in this article. The underlying logic of this command line tool is API calls it makes to their main website (www.deepl.com). So Whenever we try to translate something, It will send the request to the main website and get the results back. So, Your server or machine must have active internet connection for this tool to work. Translate Shell is another tool which does same function. Let's start with the installation.

Installation of Deepl Translator command line tool

Step 1: Before doing the installation DeepL, We need to install nodejs version >6.0. It prerequisite for DeepL translator tool. By default, Linux distributions don't come with node PPA configured. We will configure PPA first and then install nodejs 6.0. Ignore this step if you have nodejs already installed with 6.0 or higher version. Create a file /etc/apt/sources.list.d/nodesource.list and add content as shown below.
$ vi /etc/apt/sources.list.d/nodesource.list
deb https://deb.nodesource.com/node_6.x xenial main
deb-src https://deb.nodesource.com/node_6.x xenial main
Execute below step to install nodejs
$ curl -s https://deb.nodesource.com/gpgkey/nodesource.gpg.key | sudo apt-key add -
$ apt-get update
$ apt-get install nodejs
Step 2: Install Yarn package dependency manager if it is not installed. Execute below commands to install yarn
$ curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
$ echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
$ sudo apt-get update
$ sudo apt-get install yarn
Step 3: Finally, Execute below command to install Deepl translator on your machine.
$ yarn global add deepl-translator-cli
You can check installation status by checking the version of Deepl. Execute below command to check the version of Deepl installed.
$ deepl --version
1.0.1
That's it. We have successfully installed deepl translator. Now let's explore it.

Usage of Deepl Translator command line tool

Deepl translator supports below language at the time of writing this.
  • English (EN)
  • German (DE)
  • French (FR)
  • Spanish (ES)
  • Italian (IT)
  • Dutch (NL)
  • Polish (PL)
The neural networks are already training to master more languages like Mandarin, Japanese, and Russian.
Apart from translation, Deepl has capabilities to detect the input language too. So, Basically, Deepl works in two modes: one is translate and other is detect
  • Translation
    1. To translate the sentence or word use below syntax
$ deepl translate -t 'FR' "Hey, What's going on?"
Hé, qu'est-ce qui se passe?
Here, FR is ISO code for the french language, Deepl has given the output in the French language. Parameter Breakup of above translate command is shown below.
deepl translate -t '${TARGET_LANGUAGE_ISO_CODE}' '${INPUT STRING}'
  • Detection
    1. You can detect the language of specific sentence as show below using deepl translator.
$ deepl detect "Batman può essere chiunque"
Italian (IT)
Here, Deepl has detected the input sentence as italian. Parameter Breakup of above detect command is shown below.
deepl detect '${INPUT STRING}'
You can execute below command to get help from command line
$ deepl -help
Source: linoxide.com