Thursday, August 28, 2025

Bergamot-translator Linux and Python

Running bergamot-translator on Linux

$ git clone git@github.com:browsermt/bergamot-translator.git
$ mkdir build
$ sudo apt install libpcre2-dev libopenblas-dev
$ cmake ..
$ make -j
yaml file:
bergamot-mode: native
models:
  - firefox-translations-models/models/prod/esen/model.esen.intgemm.alphas.bin
vocabs:
  - firefox-translations-models/models/prod/esen/vocab.esen.spm
  - firefox-translations-models/models/prod/esen/vocab.esen.spm
shortlist:
    - firefox-translations-models/models/prod/esen/lex.50.50.esen.s2t.bin
    - false
beam-size: 1
normalize: 1.0
word-penalty: 0
max-length-break: 128
mini-batch-words: 1024
workspace: 128
max-length-factor: 2.0
skip-cost: true
cpu-threads: 0
quiet: false
quiet-translation: false
gemm-precision: int8shiftAlphaAll
alignment: soft 

where esen is the language pair for the translation, in this case es→en (Spanish to English).

The models/vocabs/shortlist files should be sourced from the firefox-translations-models repository, with git-lfs. There's some docs which still point to Google cloud storage for downloads, but those are stale.

Pipe some data through bergamot-translator:

echo "Hola mundo" | ./bergamot-translator --model-config-paths config.yml
---

Requirement: Python <= 3.10 (wheels are not available for newer versions)

pip install bergamot

import bergamot

config = bergamot.ServiceConfig(numWorkers=4)
service = bergamot.Service(config)
model = service.modelFromConfigPath("bergamot.config.yml")
options = bergamot.ResponseOptions(
    alignment=False, qualityScores=False, HTML=False
)
response = service.translate(model, bergamot.VectorString([
    "In the last 3 months, over 80 arrestees were released from the Central Booking facility without being formally charged.",
    "Since its inception, The Onion has become a veritable news parody empire.",
    "The hostel’s guests were mostly citizens of the United Arab Emirates.",
]), options)

for r in response:
    print(r.target.text)
 

bergamot.config.yml:

# To imitate production setting, these Marian options are set according to
# https://github.com/mozilla/firefox-translations/blob/main/extension/controller/translation/translationWorker.js
# For reference, see https://github.com/mozilla/firefox-translations-models/blob/main/evals/translators/bergamot.sh

bergamot-mode: wasm
models:
  - ./model.enro.intgemm.alphas.bin
vocabs:
  - ./vocab.enro.spm
  - ./vocab.enro.spm
shortlist:
    - ./lex.50.50.enro.s2t.bin
    - false
beam-size: 1
normalize: 1.0
word-penalty: 0
max-length-break: 128
mini-batch-words: 1024
workspace: 128
max-length-factor: 2.0
skip-cost: true
cpu-threads: 4
quiet: false
quiet-translation: false
gemm-precision: int8shiftAlphaAll
alignment: soft