Monday, February 27, 2017

Free/open-source machine translation software

Rule-based systems

  • Apertium, a free/open-source rule-based machine translation platform.
  • Matxin, a free/open-source rule-based machine translation system for Basque.
  • OpenLogos, a free/open-source version of the historical Logos machine translation system.
  • Anusaaraka, English-Hindi machine translation system.

Statistical machine translation systems


  • Moses, a statistical machine translation system.
  • Marie, an n-gram-based statistical machine translation decoder.
  • Joshua, an open source decoder for statistical translation models based on synchronous context free grammars
  • Phramer, an open-source statistical phrase-based machine translation decoder
  • GREAT, a decoder based on stochastic finite-state transducers, which includes a training toolkit.
  • The Thot toolkit includes a decoder as of 2014.
  • Travatar is a tree-to-string statistical machine translation system.
  • CDEC is a decoder, aligner, and model optimizer for statistical machine translation and other structured prediction models based on (mostly) context-free formalisms written by Chris Dyer at the Language Technologies Institute in Carnegie Mellon University

Training translation models

  • Giza++ is a tool to train translation models for statistical machine translation (see also the related mkcls tool to train word classes)
  • Thot includes a toolkit to train phrase-based models for statistical machine translation.

Language models

  • IRSTLM, free/open-source language modelling tool to be used with Moses instead of SRILM, which is not free.
  • RandLM, space-efficient ngram-based language models built using randomized representations (Bloom Filters etc).
  • Kenneth Heafield's software for the fast filtering of ARPA format language models to multiple vocabularies.
  • Holger Schwenk's Continuous Space Language Model toolkit (CSLM) works by projecting the word indices onto a continuous space and using a probability estimator operating on this space.


  • Kenneth Heafield's scripts that make it easy to score machine translation output using NIST's BLEU and NIST, TER, and METEOR.

Other software

  • RIA is a tool for automatic induction of transfer rules for Transfer-Based Statistical Machine Translation using dependency structures.
  • Chaski: Distributed phrase-based machine translation training tool based on Hadoop.
  • Grammatical Framework, a free/open-source programming language used to create grammars for multilingual applications.

Example-based machine translation systems

Multi-engine machine translation / system combination

Aligners and translation models

  • Giza++: training of statistical translation models.
  • Anymalign, a multilingual sub-sentential aligner.
  • Ventsislav Zhechev's Sub-tree aligner which can be used for the automatic generation of parallel treebanks.

Web services around machine translation

  • Tradubi is an open-source Ajax-based web application for social translation built upon Apertium (may be tested online).

Distributed machine translation

  • ScaleMT (no release yet, browse at the Apertium Subversion repository) is a free/open-source framework for building scalable machine translation web services.

Quality estimation

  • Quest++, an open source tool for translation quality estimation developed by the group of Lucia Specia at the Univ. of Sheffield (note that the current version still has one important non-free dependency: SRILM).

Other useful tools

... that may be used to build machine translation systems
  • Freeling, a free/open-source suite of language analyzers.
  • Bitextor, an automatic bitext harvester
  • Foma, a finite-state machine toolkit and library
  • HFST, Helsinki Finite State Technology for natural-language morphologies.
  • VISL CG-3, the constraint grammar parser at the Visual Interactive Syntax Learning project of Syddansk Universitet: browse Subversion repository, source snapshots.
  • Source:
Other tools, some may be outdated (source: