Running bergamot-translator on Linux
$ git clone git@github.com:browsermt/bergamot-translator.git
$ mkdir build
$ sudo apt install libpcre2-dev libopenblas-dev
$ cmake ..
$ make -j
yaml file:
bergamot-mode: native
models:
- firefox-translations-models/models/prod/esen/model.esen.intgemm.alphas.bin
vocabs:
- firefox-translations-models/models/prod/esen/vocab.esen.spm
- firefox-translations-models/models/prod/esen/vocab.esen.spm
shortlist:
- firefox-translations-models/models/prod/esen/lex.50.50.esen.s2t.bin
- false
beam-size: 1
normalize: 1.0
word-penalty: 0
max-length-break: 128
mini-batch-words: 1024
workspace: 128
max-length-factor: 2.0
skip-cost: true
cpu-threads: 0
quiet: false
quiet-translation: false
gemm-precision: int8shiftAlphaAll
alignment: soft
where esen
is the language pair for the translation, in this case es→en (Spanish to English).
The models/vocabs/shortlist files should be sourced from the firefox-translations-models
repository, with git-lfs
. There's some docs which still point to Google
cloud storage for downloads, but those are stale.
Pipe some data through bergamot-translator
:
echo "Hola mundo" | ./bergamot-translator --model-config-paths config.yml
---
Requirement: Python <= 3.10 (wheels are not available for newer versions)
pip install bergamot
import bergamot
config = bergamot.ServiceConfig(numWorkers=4)
service = bergamot.Service(config)
model = service.modelFromConfigPath("bergamot.config.yml")
options = bergamot.ResponseOptions(
alignment=False, qualityScores=False, HTML=False
)
response = service.translate(model, bergamot.VectorString([
"In the last 3 months, over 80 arrestees were released from the Central Booking facility without being formally charged.",
"Since its inception, The Onion has become a veritable news parody empire.",
"The hostel’s guests were mostly citizens of the United Arab Emirates.",
]), options)
for r in response:
print(r.target.text)
bergamot.config.yml:
# T
o imitate production setting, t
hese Marian options are set according to
# https://github.com/mozilla/firefox-translations/blob/main/extension/controller/translation/translationWorker.js
# For reference, see https://github.com/mozilla/firefox-translations-models/blob/main/evals/translators/bergamot.sh
bergamot-mode: wasm
models:
- ./model.enro.intgemm.alphas.bin
vocabs:
- ./vocab.enro.spm
- ./vocab.enro.spm
shortlist:
- ./lex.50.50.enro.s2t.bin
- false
beam-size: 1
normalize: 1.0
word-penalty: 0
max-length-break: 128
mini-batch-words: 1024
workspace: 128
max-length-factor: 2.0
skip-cost: true
cpu-threads: 4
quiet: false
quiet-translation: false
gemm-precision: int8shiftAlphaAll
alignment: soft