Showing posts with label Neural Machine Translation. Show all posts
Showing posts with label Neural Machine Translation. Show all posts

Friday, August 29, 2025

Python Script to Download Bergamot Models

import httpx

import tarfile

import os

from urllib.parse import urlparse

 

# https://data.statmt.org/bergamot/models/models.json

# List of model URLs

urls = [

"https://data.statmt.org/bergamot/models/csen/csen.student.base.v1.cd5418ba6a412fc7.tar.gz",

"https://data.statmt.org/bergamot/models/csen/csen.student.tiny11.v1.8f603aded58f0a3c.tar.gz",

"https://data.statmt.org/bergamot/models/csen/encs.student.base.v1.db770d87e491b0dc.tar.gz",

"https://data.statmt.org/bergamot/models/csen/encs.student.tiny11.v1.b5c1ff605296b0e5.tar.gz",

"https://data.statmt.org/bergamot/models/deen/deen.student.base.v2.caa7c0ce3c8eaf05.tar.gz",

"https://data.statmt.org/bergamot/models/deen/deen.student.tiny11.v2.9f70fcb17bf9572d.tar.gz",

"https://data.statmt.org/bergamot/models/deen/ende.student.base.v2.37b172bc9b594f9b.tar.gz",

"https://data.statmt.org/bergamot/models/deen/ende.student.tiny11.v2.93821e13b3c511b5.tar.gz",

"https://data.statmt.org/bergamot/models/esen/esen.student.tiny11.v1.09576f06d0ad805e.tar.gz",

"https://data.statmt.org/bergamot/models/esen/enes.student.tiny11.v1.a7203a8f8e9daea8.tar.gz",

"https://data.statmt.org/bergamot/models/eten/eten.student.tiny11.v1.38de61c668e42f36.tar.gz",

"https://data.statmt.org/bergamot/models/eten/enet.student.tiny11.v1.0b8f835b0c154aaa.tar.gz",

"https://data.statmt.org/bergamot/models/isen/isen.student.base.v2.536d6b8808a5c076.tar.gz",

"https://data.statmt.org/bergamot/models/isen/isen.student.tiny11.v2.829203cf37b7bdc4.tar.gz",

"https://data.statmt.org/bergamot/models/nben/nben.student.tiny11.v1.e410ce34f8337aab.tar.gz",

"https://data.statmt.org/bergamot/models/nnen/nnen.student.tiny11.v1.0efa37c16887eea4.tar.gz",

"https://data.statmt.org/bergamot/models/bgen/bgen.student.tiny11.v1.f9c89a3a25ff8dca.tar.gz",

"https://data.statmt.org/bergamot/models/bgen/enbg.student.tiny11.v1.3ea060c1b76470a7.tar.gz",

"https://data.statmt.org/bergamot/models/plen/plen.student.tiny11.v1.87148203cbda2842.tar.gz",

"https://data.statmt.org/bergamot/models/plen/enpl.student.tiny11.v1.c33219daa12e7872.tar.gz",

"https://data.statmt.org/bergamot/models/fren/fren.student.tiny11.v1.dccea16d03c0a389.tar.gz",

"https://data.statmt.org/bergamot/models/fren/enfr.student.tiny11.v1.805d112122af03d0.tar.gz",

"https://data.statmt.org/bergamot/models/hbseng/hbseng.student.tiny11.v1.fa8a29e01a5332ba.tar.gz",

"https://data.statmt.org/bergamot/models/slen/slen.student.tiny11.v1.d029034e49c3bb08.tar.gz",

"https://data.statmt.org/bergamot/models/mken/mken.student.tiny11.v1.dd03ef56f4695c7b.tar.gz",

"https://data.statmt.org/bergamot/models/mten/mten.student.tiny11.v1.4089a5a036eff1c3.tar.gz",

"https://data.statmt.org/bergamot/models/tren/tren.student.tiny11.v1.d7728d17a313230a.tar.gz",

"https://data.statmt.org/bergamot/models/sqen/sqen.student.tiny11.v1.6ead0c9b236f942b.tar.gz",

"https://data.statmt.org/bergamot/models/caen/caen.student.tiny11.v1.edaf67d1938e80d3.tar.gz",

"https://data.statmt.org/bergamot/models/elen/elen.student.tiny11.v1.0006442831596378.tar.gz",

"https://data.statmt.org/bergamot/models/uken/uken.student.tiny11.v1.108d04d1e160153a.tar.gz"

]


# Create a folder to store all models

os.makedirs("models", exist_ok=True)


for url in urls:

filename = os.path.basename(url)

folder_name = filename.replace(".tar.gz", "")

folder_path = os.path.join("models", folder_name)

os.makedirs(folder_path, exist_ok=True)


print(f"📥 Downloading {filename}...")

response = httpx.get(url)

archive_path = os.path.join(folder_path, filename)


with open(archive_path, "wb") as f:

f.write(response.content)


print(f"📦 Extracting to {folder_path}...")

with tarfile.open(archive_path, "r:gz") as tar:

tar.extractall(path=folder_path)


os.remove(archive_path)

print(f"✅ Done: {folder_name}\n")


print("🎉 All models downloaded and extracted!")



Thursday, August 28, 2025

Bergamot Translator Linux and Python - Bergamot and TranslateLocally Models

Running bergamot-translator on Linux

$ git clone git@github.com:browsermt/bergamot-translator.git
$ mkdir build
$ sudo apt install libpcre2-dev libopenblas-dev
$ cmake ..
$ make -j
yaml file:
bergamot-mode: native
models:
  - firefox-translations-models/models/prod/esen/model.esen.intgemm.alphas.bin
vocabs:
  - firefox-translations-models/models/prod/esen/vocab.esen.spm
  - firefox-translations-models/models/prod/esen/vocab.esen.spm
shortlist:
    - firefox-translations-models/models/prod/esen/lex.50.50.esen.s2t.bin
    - false
beam-size: 1
normalize: 1.0
word-penalty: 0
max-length-break: 128
mini-batch-words: 1024
workspace: 128
max-length-factor: 2.0
skip-cost: true
cpu-threads: 0
quiet: false
quiet-translation: false
gemm-precision: int8shiftAlphaAll
alignment: soft 

where esen is the language pair for the translation, in this case es→en (Spanish to English).

The models/vocabs/shortlist files should be sourced from the firefox-translations-models repository, with git-lfs. There's some docs which still point to Google cloud storage for downloads, but those are stale.

Pipe some data through bergamot-translator:

echo "Hola mundo" | ./bergamot-translator --model-config-paths config.yml
---

Requirement: Python <= 3.10 (wheels are not available for newer versions)

pip install bergamot

import bergamot

config = bergamot.ServiceConfig(numWorkers=4)
service = bergamot.Service(config)
model = service.modelFromConfigPath("bergamot.config.yml")
options = bergamot.ResponseOptions(
    alignment=False, qualityScores=False, HTML=False
)
response = service.translate(model, bergamot.VectorString([
    "In the last 3 months, over 80 arrestees were released from the Central Booking facility without being formally charged.",
    "Since its inception, The Onion has become a veritable news parody empire.",
    "The hostel’s guests were mostly citizens of the United Arab Emirates.",
]), options)

for r in response:
    print(r.target.text)
 

bergamot.config.yml:

# To imitate production setting, these Marian options are set according to
# https://github.com/mozilla/firefox-translations/blob/main/extension/controller/translation/translationWorker.js
# For reference, see https://github.com/mozilla/firefox-translations-models/blob/main/evals/translators/bergamot.sh

bergamot-mode: wasm
models:
  - ./model.enro.intgemm.alphas.bin
vocabs:
  - ./vocab.enro.spm
  - ./vocab.enro.spm
shortlist:
    - ./lex.50.50.enro.s2t.bin
    - false
beam-size: 1
normalize: 1.0
word-penalty: 0
max-length-break: 128
mini-batch-words: 1024
workspace: 128
max-length-factor: 2.0
skip-cost: true
cpu-threads: 4
quiet: false
quiet-translation: false
gemm-precision: int8shiftAlphaAll
alignment: soft 

Translatelocally compatible models:
https://translatelocally.com/models.json
Firefox models:
https://github.com/mozilla/firefox-translations-models/tree/main/models 

Wednesday, February 12, 2025

Local Translation API in Chrome-based Browsers

Client-side Translator API with a model built into Chrome (from version 131).

Possible use case: customer support chat which allows for users to type in their first language and receive real-time translation for the support agent. github.com/webmachinelearning/translation-api

The Translator API has two important methods:

  • canTranslate(): Checks if a translation model for your language pair is ready. Returns "readily" if the model is already available on device, "after-download" if the browser first needs to download the model, and "no" if translation is not possible.
  • createTranslator(): This sets up your Translator object asynchronously. If the model needs downloading, it'll wait until it's ready.

The Translator object has just one method:

  • translate(): Feed it the source text, and it outputs the translated version.

As this is experimental and Chrome-specific for now, be sure to wrap all your code in feature detection.

const supportsOnDevice = 'model' in window && 'createTranslator' in model;
if (!supportsOnDevice) {
  return;
}

const parameters = { sourceLanguage: 'en', targetLanguage: 'pt' };
const modelState = await model.canTranslate(parameters);
if (modelState === 'no') {
  return;
}
const onDeviceTranslator = await model.createTranslator(parameters);

const result = await onDeviceTranslator.translate(input);
if (!result) {
  throw new Error('Failed to translate');
}
return result;

The model needs time to become available to the user. You can approach this in two ways:

  • Wait to enable your translation-powered UI elements once the model is ready.
  • Start with server-side AI for translation, then switch to client-side once the model has downloaded.

Sign up for the Translator API origin trial to enable your translation features for all users on your origin, on Chrome. Opening an Issue on the Explainer.

Use the Translator API in Chrome to translate text in the browser, using local AI models.

Translation of content on the web has typically required using a cloud service. First, the source content is uploaded to a server, which runs the translation to a target language, then the resulting text is downloaded and returned to the user. By running translation on the client, you save the time required by server trips and the cost of hosting the translation service.

Join the Translator API origin trial, running in Chrome beginning with version 131 (Chrome and derivative branches like Dev, Canary).

While you always know the target language for translations, you may not always know the source language, such as in user-generated content. For such cases, the Translator API proposal includes both the Translator API and the Language Detector API, also available in an origin trial. Sign up for both origin trials to use these APIs together.

To start using the Translator API, follow these steps:

  1. Acknowledge Google's Generative AI Prohibited Uses Policy.
  2. Go to the Translator API origin trial.
  3. Click Register and fill out the form.
    • In the Web origin field, provide your origin or extension ID, chrome-extension://YOUR_EXTENSION_ID.
  4. To submit, click Register.
  5. Copy the token provided, and add it to every web page on your origin or file for your Extension, on which you want the trial to be enabled.
  6. Start using the Translator API.

Learn more about how to get started with origin trials.

To access the Translator API on localhost during the origin trial, you must update Chrome to the latest version. Then, follow these steps:

  1. Go to chrome://flags/#translation-api.
  2. Select Enabled.
    • To try more language pairs, select Enabled without language pack limit.
  3. Click Relaunch or restart Chrome.

To determine if the Translator API is supported, run the following feature detection snippet.

if ('ai' in self && 'translator' in self.ai) {
  // The Translator API is supported.
}

Translation is managed with language packs, downloaded on demand. A language pack is like a dictionary for a given language.

  • sourceLanguage: The current language for the text.
  • targetLanguage: The final language the text should be translated into.

Use BCP 47 language short codes as strings. For example, 'es' for Spanish or 'fr' for French.

 const translatorCapabilities = await self.ai.translator.capabilities();
 translatorCapabilities.languagePairAvailable('es', 'fr');
 // 'readily'

The languagePairAvailable() function can return any of the following results:

  • no: It's not possible for this browser to translate as requested.
  • readily: The browser can translate as requested.
  • after-download: The browser can perform the translation, but only after it downloads the relevant model or language packs.

You can listen for model download progress using the downloadprogress event:

const translator = await self.ai.translator.create({
  sourceLanguage: 'es',
  targetLanguage: 'fr',
  monitor(m) {
    m.addEventListener('downloadprogress', (e) => {
      console.log(`Downloaded ${e.loaded} of ${e.total} bytes.`);
    });
  },
});

If the download fails, then downloadprogress events stop being emitted and the ready promise is rejected.

To create a translator, call the asynchronous translation.createTranslator() function. Like canTranslate(), it requires an options parameter with two fields, one for the sourceLanguage and one for the targetLanguage.

// Create a translator that translates from English to French.
const translator = await self.ai.translator.create({
  sourceLanguage: 'en',
  targetLanguage: 'fr',
});

Once you have a translator, call the asynchronous translate() function to translate your text.

await translator.translate('Where is the next bus stop, please?');
// "Où est le prochain arrêt de bus, s'il vous plaît ?"

The following limitations apply during the origin trial.

At this time, up to three language packs can be downloaded for translation. We're committed to expand the range of supported languages in future releases, while maintaining high standards for user privacy. You can confirm if the language pair you need is supported with the languagePairAvailable() function.

It's possible that certain, less frequently used language pairs may be used for fingerprinting. For example, it's more common to translate between English and Spanish than between less common languages, such as Gaelic and Catalan. A less common language pair could be considered a data point for user identification.

During the origin trial, we're limiting the potential translatable language pairs to protect user privacy. Language pairs must meet the following criteria:

  • Both the source and the destination language are set as preferred languages in Chrome.
  • Or, one of he languages is set as a preferred language in Chrome, and the other is among the following popular languages:
    • English (en)
    • Mandarin Chinese (zh; simplified) or Taiwanese Mandarin (zh-Hant; traditional)
    • Japanese (ja)
    • Portuguese (pt)
    • Russian (ru)
    • Spanish (es)
    • Turkish (tr)
    • Hindi (hi)
    • Vietnamese (vi)
    • Bengali (bn)

For local prototyping, you can bypass these checks by running Chrome with the command line option --disable-features=TranslationAPIAcceptLanguagesCheck. Alternatively, set chrome://flags/#translation-api to Enable without language pack limit.

Visit chrome://on-device-translation-internals/ to manually install and uninstall language packs.

Translations are processed sequentially. If you send large amounts of text to be translated, subsequent translations are blocked until the earlier ones complete.

For the best responsiveness of your translation requests, chunk them together and consider displaying a loading interface, such as a spinner, to convey that a translation is ongoing.

During the origin trial, the Translator API is only supported from the main thread. We intend to support it in web workers once the API is widely available.

You can see the Translator API, used in combination with the Language Detector API, in the Translator and Language Detector API playground.

We're working to standardize the Translator API, to ensure cross-browser compatibility.

Our API proposal received community support and has moved to the W3C Web Incubator Community Group for further discussion. The Chrome team requested feedback from the W3C Technical Architecture Group and asked Mozilla and WebKit for their standards positions.

Start testing the Translator API now by joining the origin trial and share your feedback. Your input can directly impact how we build and implement future versions of this API, and all built-in AI APIs.

  • Install Translation Detection API on Chrome

  • Go to chrome://flags/#language-detection-api.
  • Select Enabled
  • Go to chrome://flags/#translation-api.
  • Select Enabled without language pack limit to try more language pairs.
  • Click Relaunch or restart Chrome.
  • Open a new tab, go to chrome://components.
  • Find Chrome TranslateKit
  • Click "Check for update" button to download the language model. The version number should update.
  • (Optional) Open a new tab, go to chrome://on-device-translation-internals/
  • (Optional) Install language pairs.

 

Tuesday, January 7, 2025

Friday, September 8, 2023

MemoQ Unsigned OPUSCAT Plug-in Warning

 To get rid of the memoQ unsigned plug-in warning that OpusCAT triggers, you need to create a ClientDevConfig.xml file under ProgramData/MemoQ containing just the following code:

 <?xml version="1.0" encoding="utf-8"?>
<ClientDevConfig>
<LoadUnsignedPlugins>true</LoadUnsignedPlugins>
</ClientDevConfig>

Source: https://github.com/Helsinki-NLP/OPUS-CAT

Try:

Installation:

  1. Download and unpack the zip file.
  2. Copy the ClientDevConfig.xml file to your %programdata%/MemoQ folder. This is usually located at C:\Users\\$USERNAME\AppData\Roaming\MemoQ or C:\Users\\$USERNAME\AppData\Local\MemoQ, but note that the AppData folder is hidden, so you may need to enable "Show hidden items" in the Windows Explorer settings.
  3. Copy the .dll file to the Addins folder of your MemoQ installation (C:\Program Files\memoQ\memoQ-9\Addins). Make sure to delete any previous plugin files.

Troubleshooting:

  • If an error message regarding CAS policy is displayed upon MemoQ startup, unzip it using 3rd party software (such as WinRar). This is because Windows flags .dll files that have been downloaded on their own or inside a .zip file that is unpacked using Windows Explorer.
  • If you notice incompatibility issues or other errors, please open an issue under this repository.

Saturday, June 3, 2023

ChatGPT for Translation

ChatGPT's interactive nature makes it a standout translation tool. With other translation tools, you provide a text, you get a translation, and that's it. Whether it is the best translation you can get doesn't matter—you're stuck with it.

With ChatGPT, you can customize translations to suit your specific needs and provide feedback on adjustments you'd love to see. For example, you can adjust the tone and style and take into account some cultural connotations and regional differences in the meaning of words, something purpose-built translation tools like Google Translate can not do.
All you need to do is provide the text you want to translate, specify the language you want to translate it to, and ChatGPT will handle the rest.

1. Provide Context

One of the key advantages of ChatGPT over popular translation tools like Google Translate is its ability to accurately consider the context of a text when generating translations. Considering context can be the difference between simply translating individual words in a sentence and generating a translation that truly reflects the author's or speaker's intention.

Take the Spanish sentence “Gracias por preguntar, pero estoy bastante seguro aquí” for instance. Google Translate produces "Thanks for asking, but I'm pretty sure here" as the translation. While this isn't entirely wrong, depending on the context, the sentence could mean, "Thanks for asking, but I'm safe here."
 Of course, Google Translate will provide the same translation no matter how many times you attempt to translate it because it doesn't have a way to recognize contextual nuance. As per the screenshot above, ChatGPT will attempt to provide the most accurate translation depending on the provided context. Providing context can significantly improve the quality of your translation. If you are not sure how to provide context, here are some inspirations:

    "Translate [text to translate in Filipino] to English from the perspective of a native Filipino speaker" should try to maintain as many cultural connotations as possible in a translation.
    "Translate [text to translate] to English from the perspective of someone discussing the COVID-19 pandemic" should use appropriate medical terms instead of generic words.
    "Translate [text to translate] to English. The text discusses a battle during WWII" should use appropriate military and historical terms.

2. Declare the Type of Text

Another important factor that can increase the accuracy of your translation is outright declaring the kind of text you're trying to translate. For example, is it an idiom, a song, a financial document, or an ordinary text? Simply letting ChatGPT know what you're trying to translate gives the chatbot an edge toward providing more accurate translations.

Instead of simply using a prompt like "Translate [text to translate] to [target language]." You should ideally use alternatives like:

    Translate the [Financial report | poem | song | Bible portion | proverb] in quotes to [target language]
    Translate [text to translate] to [target language]. The text to be translated is a [military report | Medical document | Drug prescription]

The prompts above or similar ones help ChatGPT use relevant or industry-specific context when generating a translation. Although ChatGPT sometimes recognizes the right niche words to use for translation, you'll have to explicitly prompt it to do so using type declaration in some cases.
3. Use Style Transfer

Sometimes, when translating text, the translation might be too technical or simply inappropriate for the target audience. Using style transfer in ChatGPT can help adjust the tone and style of a translation to match the target audience or industry. So, if you're translating a legal document, the translation could retain the author's original meaning while using more layman's wording. In the example below, I translated a soccer commentary from Spanish to English, first without style transfer and then using style transfer.
 The translation above uses the closest English equivalent of the corresponding Spanish words, while the one below uses words suitable for an audience not acquainted with soccer terms. However, it is interesting to note that both translations are considered accurate.
An English translation using style transfer

To use style transfer when translating, use prompts like:

    Translate [text to translate] to [target language] in layman's terms.
    Translate [text to translate] to [target language] for a [grade 5] audience
    Translate [text to translate] to [target language]. Use style transfer to make the translated text suitable for a [target audience]

4. Account for Regional Differences

Some words may have different meanings or connotations depending on the region or country of the speaker. For instance, the English sentence "I'm going to play football" could translate to "我要去踢足球 (Wǒ yào qù tī zúqiú)" in Chinese. While this seems like a perfect translation, if the speaker was American, the translation could be wrong. By saying "football," an American speaker would likely be referring to the rugby-style sport called American football instead of the football known by the rest of the world.

Regular translation tools have no way to account for this potential misinterpretation. ChatGPT, on the other hand, can provide varying translations depending on the speaker's origin.

We prompted ChatGPT to translate "I'm going to play football" into Chinese. As expected, it produced "我要去踢足球 (Wǒ yào qù tī zúqiú)." In Chinese, "zúqiú" means "football," which refers to soccer rather than the rugby-style sport.
ChatGPT translation accounting for regional differences in meaning

We repeated the translation prompt but added hints about the speaker's origin and possible intent. ChatGPT changed the translation to "我要去踢橄榄球 (Wǒ yào qù tī gǎnlǎnqiú)," this time using "gǎnlǎnqiú" which is the Chinese term for American football and better reflects the potential intent of the speaker.
5. Use Summarized Translation

Sometimes, you don't want to read the entirety of a text. You just want to understand the message the author or speaker is trying to pass across. ChatGPT is one of the few translation tools you can rely on for situations like this. To get a summarized translation, ask ChatGPT to provide a "summarized" or "condensed" translation of the target text. Some prompts examples include:

    Provide a descriptive but condensed translation of [text to translate] in Spanish.
    Provide a Summarized translation of [text to translate] in French.
    Provide a summarized translation of [text to translate] in English.
    Translate this article into Dutch, but only include the key points.

6. Use a Fine-Tuned Instance of ChatGPT

Using a fine-tuned instance of ChatGPT is one of the best ways to utilize ChatGPT as a translation tool. It opens up almost endless possibilities for translation using the AI chatbot. But how can you fine-tune ChatGPT for translation?

You can do it in several ways. A key component to fine-tuning ChatGPT for translation is laying out rules the chatbot must follow when translating any text you provide. For instance, you can fine-tune ChatGPT by providing a word-translation pair or a text-translation pair. Here's an example below:

While trying to translate a Pidgin text into English, we ran into some wrongly translated words. Providing the word-translation pairs below made ChatGPT update its translation of the words in subsequent translations.
Fine-tunning ChatGPT for translation tasks

You can also make ChatGPT translations more accurate by providing several large texts and their verified translations. You can then prompt ChatGPT to deduce the right translation of words and phrases from the provided samples and apply it when translating text involving a similar language pair. While you can use significantly longer texts to fine-tune ChatGPT translations, below is a short illustration of how it works using a short paragraph.
Providing a parralel corpus of text for ChatGPT

We achieved improved translation with every prompt without taking any further steps.

Don't Rely Solely on Machine Translation

While ChatGPT is an impressive translation tool, it's important to remember that it is still a machine and may not always produce the best translation. So don't rely solely on it, especially for important or sensitive documents. Instead, try a combination of tools, and whenever possible, consider using a professional translator to proofread to ensure accuracy.

Source: makeuseof.com

Other resources: https://chatdico.com/

Sunday, December 27, 2020

Open Source Machine Translation Systems

 

NMT

System Team Description Link Framework
Tensor2Tensor Google Brain Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research. https://github.com/tensorflow/tensor2tensor Tensorflow
Fairseq Facebook Research Facebook AI Research Sequence-to-Sequence Toolkit written in Python. https://github.com/pytorch/fairseq Pytorch
facebookresearch/fairseq Facebook Research Facebook AI Research Sequence-to-Sequence Toolkit https://github.com/facebookresearch/fairseq Lua
tensorflow/nmt Google Brain TensorFlow Neural Machine Translation Tutorial https://github.com/tensorflow/nmt Tensorflow
OpenNMT-tf OpenNMT Neural machine translation and sequence learning using TensorFlow https://github.com/OpenNMT/OpenNMT-tf Tensorflow
OpenNMT-py OpenNMT Open Source Neural Machine Translation in PyTorch https://github.com/OpenNMT/OpenNMT-py Pytorch
THUMT Tsinghua Natural Language Processing Group Transformer, Multi-GPU training & decoding, Distributed training https://github.com/THUNLP-MT/THUMT Tensorflow/Theano
NiuTrans.NMT NiuTrans Transformer and FFN-LM based on NiuTrans.Tensor by NiuTrans Team. https://github.com/NiuTrans/NiuTrans.Tensor C/C++
MARIANNMT Adam Mickiewicz Pure C++ with minimal dependencies, one engine for GPU/CPU training and decoding https://marian-nmt.github.io/ C++
Seq2Seq Britz Denny and Goldie, Anna and Luong Thang and Le Quoc A general-purpose encoder-decoder framework for Tensorflow https://github.com/google/seq2seq Tensorflow
NEMATUS The Natural Language Processing Group at the University of Edinburgh Support for RNN and Transformer architectures, multi-GPU support, server mode https://github.com/EdinburghNLP/nematus Tensorflow
Sockeye Awslabs A sequence-to-sequence framework for Neural Machine Translation https://awslabs.github.io/sockeye/ MXNet
CytonMT Wang, Xiaolin and Utiyama, Masao and Sumita, Eiichiro An Efficient Neural Machine Translation Open-source Toolkit Implemented in C++ https://github.com/arthurxlw/cytonMt C++
OpenSeq2Seq NVIDIA Modular architecture,support for mixed-precision training,fast Horovod-based distributed training https://nvidia.github.io/OpenSeq2Seq/html/index.html TensorFlow
nmtpytorch The Language and Speech Team of Le Mans University Various end-to-end neural architectures https://github.com/lium-lst/nmtpytorch Pytorch
DL4MT Cho Lab at NYU CS and CDS A multi-encoder, multi-decoder or a multi-way NMT model https://github.com/nyu-dl/dl4mt-multi Theano
ModerNMT Marco, Trombetti and Davide, Caroselli and Nicola, Bertoldi A context-aware, incremental and distributed general purpose Neural Machine Translation technology based on Fairseq Transformer model https://github.com/ModernMT/MMT PyTorch
UnsupervisedMT Facebook Research Seq2seq, biLSTM + attention, Transformer. Ability to share an arbitrary number of parameters. Denoising auto-encoder training. https://github.com/facebookresearch/UnsupervisedMT PyTorch

SMT

System Team Description Link Framework
Moses moses-smt A free software, statistical machine translation engine that can be used to train statistical models of text translation from a source language to a target language http://www.statmt.org/moses/ C++
GIZA++ moses-smt A SMT toolkit that is used to train IBM Models 1-5 and an HMM word alignment model https://github.com/moses-smt/giza-pp C++
NiuTrans.SMT NiuTrans NiuTrans.SMT is an open-source statistical machine translation system developed by a joint team from NLP Lab. at Northeastern University and the NiuTrans Team. The NiuTrans system is fully developed in C++ language. https://github.com/NiuTrans/NiuTrans.SMT C/C++
UCAM-SMT The MT group in Cambridge The Cambridge Statistical Machine Translation system http://ucam-smt.github.io/ C++
Jane The RWTH Aachen University Supports state-of-the-art techniques for phrase-based and hierarchical phrase-based machine translation http://www-i6.informatik.rwth-aachen.de/jane/ C++
Phrasal Stanford NLP Group A state-of-the-art statistical phrase-based machine translation system https://nlp.stanford.edu/phrasal/ Java
cdec The Language Technologies Institute in Carnegie Mellon University A decoder, aligner, and learning framework for SMT and similar structured prediction models http://www.cdec-decoder.org/ C++
JOSHUA Juri Ganitkevitch and Matt Post A SMT decoder for phrase-based, hierarchical, and syntax-based machine translation https://cwiki.apache.org/confluence/display/JOSHUA/ Java
Source: https://github.com/NiuTrans/MT-paper-lists

Sunday, February 16, 2020

Autohotkey for Google Translate

^!r::Reload ; Assign Alt-Ctr-R as a hotkey to restart the script.
#x:: ; Windows+X - Translate segment MemoQ, Across, Trados Studio with Google Translate
FileEncoding, utf-8
WinGetActiveTitle, Title
IfInString, Title, memoQ
{
Send ^+s
Sleep, 200
SendInput ^{F8}
Sleep, 300
SendInput ^a
Sleep, 300
SendInput ^c
Clipwait
Send ^c
Sleep, 300
Clipwait
Sleep, 200
}
IfInString, Title, Across
{
Send !{PgDn}
Sleep, 200
SendInput ^a
Sleep, 300
}
IfInString, Title, Trados Studio
{
Sleep, 300
Send ^{Ins}
Sleep, 200
SendInput ^a
Sleep, 300
}
SendInput ^c
Clipwait
Send ^c
Sleep, 300
Clipwait
Sleep, 300
Clipwait
Sleep, 300
searchtext := clipboard
;StringReplace, searchtext, searchtext, .%A_SPACE%, ._, All ;StringReplace, searchtext, searchtext, %A_SPACE%., _., All
;StringReplace, searchtext, searchtext, `;%A_SPACE%, `;_, All, ; `%5F - for underscore
;StringReplace, searchtext, searchtext, `.%A_SPACE%, `._, All ;Msgbox %searchtexturlencoded%
searchtexturlencoded := UriEncode(searchtext)
;searchtexturlencoded := URLEncoding(searchtext)
; StringReplace, searchtext, searchtext, %A_SPACE%&%A_SPACE%, `%26, All
; StringReplace, searchtext, searchtext, %A_SPACE%, +, All
Sleep, 300
RunWait, wget.exe -U "Mozilla/5.0 Chrome/62.0.3202.94" "http://translate.googleapis.com/translate_a/single?client=gtx&sl=auto&tl=de&dt=t&q=%searchtexturlencoded%" -O source.out.txt,, Hide
FileRead, targettext, source.out.txt ;Msgbox %targettext%
StringTrimLeft, targettext, targettext, 4
StringTrimRight, targettext, targettext, 73
quotes := ""","""
Loop, 4 ;MsgBox, Iteration number is %A_Index%.
{
addedvalue := (A_Index - 1)
textbreak = "`,null`,null`,%addedvalue%`]`,`[" ;MsgBox Textbreak %textbreak%
IfInString, targettext, %textbreak%
{
StringReplace, targettext, targettext, %textbreak%, ‡, All ;Msgbox targettext . %targettext%
}
}
occurences =
Loop, Parse, targettext, ‡
{
loopingtext = %A_LoopField% ;Msgbox Loopfield: %loopingtext%
StringLen, targetlength, loopingtext
StringGetPos, posghilimele, loopingtext, %quotes% ;MsgBox Poziţie glilimele %posghilimele% Msgbox Target length %targetlength%
StringLen, targetlength, loopingtext
StringTrimRight, occurence, loopingtext, (targetlength - posghilimele)
occurences .= occurence ;Msgbox occurence after add %occurences% ;occurences := occurences . occurence
}
StringReplace, occurences, occurences, ș, ş, All
StringReplace, occurences, occurences, Ș, Ş, All
StringReplace, occurences, occurences, ț, ţ, All
StringReplace, occurences, occurences, Ț, Ţ, All
StringReplace, occurences, occurences, %A_SPACE%._%A_SPACE%, .%A_SPACE%, All
StringReplace, occurences, occurences, `;%A_SPACE%_, `;%A_SPACE%, All
StringReplace, occurences, occurences, `;%A_SPACE%%A_SPACE%, `;%A_SPACE%, All
StringReplace, occurences, occurences, `._, `.%A_SPACE%, All
occurences := UnSlashUnicode(occurences)
Clipboard = %occurences% ;googleout := Clipboard
Send ^v
Reload
; IfInString, Title, memoQ {Send ^{Enter} ;Send ^+{Enter}Send !{Up}}
return

uriDecode(str) {
    Loop
 If RegExMatch(str, "i)(?<=%)[\da-f]{1,2}", hex)
    StringReplace, str, str, `%%hex%, % Chr("0x" . hex), All
    Else Break
 return

, str
}

UriEncode(Uri, RE="[0-9A-Za-z]"){
    VarSetCapacity(Var,StrPut(Uri,"UTF-8"),0),StrPut(Uri,&Var,"UTF-8")
    While Code:=NumGet(Var,A_Index-1,"UChar")
    Res.=(Chr:=Chr(Code))~=RE?Chr:Format("%{:02X}",Code)
    return, Res
}

UnSlashUnicode(s) ; unslash unicode sequences like \u0026
{
rx = \\u([0-9a-fA-F]{4})
pos = 0
loop
{
pos := RegExMatch(s,rx,m,pos+1)
if (pos = 0)
break
StringReplace, s, s, %m%, % Chr("0x" . SubStr(m,3,4))
}
return, s
}

Friday, January 3, 2020

Traduceri ale landurilor germane în limba română + capitale

Land în germană
Land în română
Capitală
Baden-Württemberg
Baden-Württemberg
Stuttgart
Freistaat Bayern
Statul Liber Bavaria
München
Freistaat Berlin
Orașul Liber Berlin
Berlin
Brandenburg
Brandenburg
Potsdam
Freie und Hansestadt Bremen
Orașul Liber și Hanseatic Brema
(Brema și Bremerhaven)
Freie und Hansestadt Hamburg
Orașul Liber și Hanseatic Hamburg
Hamburg
Hessen
Hessa
Wiesbaden
Mecklenburg-Vorpommern
Mecklenburg - Pomerania Inferioară
Schwerin
Niedersachsen
Saxonia Inferioară
Hannover (Hanovra)
Nordrhein-Westfalen
Renania de Nord - Westfalia
Düsseldorf
Rheinland-Pfalz
Renania-Palatinat
Mainz
Saarland
Saarland
Saarbrücken
Freistaat Sachsen
Statul Liber Saxonia
Dresden (Dresda)
Sachsen-Anhalt
Saxonia-Anhalt
Magdeburg
Schleswig-Holstein
Schleswig-Holstein
Kiel
Freistaat Thüringen
Statul Liber Turingia
Erfurt