Thursday, August 28, 2025

Bergamot-translator Linux and Python

Running bergamot-translator on Linux

$ git clone git@github.com:browsermt/bergamot-translator.git
$ mkdir build
$ sudo apt install libpcre2-dev libopenblas-dev
$ cmake ..
$ make -j
yaml file:
bergamot-mode: native
models:
  - firefox-translations-models/models/prod/esen/model.esen.intgemm.alphas.bin
vocabs:
  - firefox-translations-models/models/prod/esen/vocab.esen.spm
  - firefox-translations-models/models/prod/esen/vocab.esen.spm
shortlist:
    - firefox-translations-models/models/prod/esen/lex.50.50.esen.s2t.bin
    - false
beam-size: 1
normalize: 1.0
word-penalty: 0
max-length-break: 128
mini-batch-words: 1024
workspace: 128
max-length-factor: 2.0
skip-cost: true
cpu-threads: 0
quiet: false
quiet-translation: false
gemm-precision: int8shiftAlphaAll
alignment: soft 

where esen is the language pair for the translation, in this case es→en (Spanish to English).

The models/vocabs/shortlist files should be sourced from the firefox-translations-models repository, with git-lfs. There's some docs which still point to Google cloud storage for downloads, but those are stale.

Pipe some data through bergamot-translator:

echo "Hola mundo" | ./bergamot-translator --model-config-paths config.yml
---

Requirement: Python <= 3.10 (wheels are not available for newer versions)

pip install bergamot

import bergamot

config = bergamot.ServiceConfig(numWorkers=4)
service = bergamot.Service(config)
model = service.modelFromConfigPath("bergamot.config.yml")
options = bergamot.ResponseOptions(
    alignment=False, qualityScores=False, HTML=False
)
response = service.translate(model, bergamot.VectorString([
    "In the last 3 months, over 80 arrestees were released from the Central Booking facility without being formally charged.",
    "Since its inception, The Onion has become a veritable news parody empire.",
    "The hostel’s guests were mostly citizens of the United Arab Emirates.",
]), options)

for r in response:
    print(r.target.text)
 

bergamot.config.yml:

# To imitate production setting, these Marian options are set according to
# https://github.com/mozilla/firefox-translations/blob/main/extension/controller/translation/translationWorker.js
# For reference, see https://github.com/mozilla/firefox-translations-models/blob/main/evals/translators/bergamot.sh

bergamot-mode: wasm
models:
  - ./model.enro.intgemm.alphas.bin
vocabs:
  - ./vocab.enro.spm
  - ./vocab.enro.spm
shortlist:
    - ./lex.50.50.enro.s2t.bin
    - false
beam-size: 1
normalize: 1.0
word-penalty: 0
max-length-break: 128
mini-batch-words: 1024
workspace: 128
max-length-factor: 2.0
skip-cost: true
cpu-threads: 4
quiet: false
quiet-translation: false
gemm-precision: int8shiftAlphaAll
alignment: soft 


Wednesday, August 27, 2025

Glosar Energie DERO

 

Heizkraftwerk centrală termică de încălzire
Abbaurate rată de descompunere
Abfallbehandlunsanlage instalaţie de tratare a reziduurilor
Abgabe transmitere
Absorber instalaţie de absorbţie
Additiv aditiv
Altholzsortimente sortimente de lemn utilizat
anfallend apărut
Anlieferung livrare
Annahmebereich zonă de alimentare
Aufbereitungsanlage instalaţie de tratare
Ausbeute exploatare
Ausbringung evacuare
Ausgangssstoff materie primă
Basismaterial material de bază
Bedienkomfort confort de utilizare
BHKW unitate de cogenerare
Bioabfall resturi vegetale
Biogas biogaz
Biomasse biomasă
Blockheizkraftwerk centrală termică combinată
Brikettierungsanlagen instalaţii de brichetare
Bunker buncăr
CO2-Ausstoß evacuare CO2
CO2-neutral fără CO2
combined heat and power / Kraft-Wärme-Kopplung CHP / cogenerare
Dampfaustausch schimb de aburi
Dampfkessel cazan cu aburi
Deponie depozit de reziduuri
Deponiefraktion fracţiunea depozitului de reziduuri
Durchforstung rărirea arboretului
Eigenkapitalbeschaffung procurarea capitalului propriu
Einkommensquelle sursă de venit
eisenhaltig feros
Elektrizitätsnetz reţea de electricitate
Emissionsgrenzwert valoare limită a emisiilor
Endlager depozit final
Energiepflanze plantă energetică
Energieträger suport energetic
Energieversorgungskonzept concept de alimentare cu energie
entschwefelt desulfurizat
erneubar regenerabil
Ersatzbrennstoff combustibil alternativ
Fassungsvermögen capacitate
Faulraum autoclavă [ch.]
Faulraum autoclavă de devulcanizare
Faulraum bazin de fermentare
Faulraum cazan fierbător
Faulraum devulcanizator
Faulraum digestor [plast.]
Faulraum fierbător [term.]
Ferment ferment
Fermenter fermenter
Festmist gunoi solid de grajd
Feststoff material solid
Feuerraum cameră de ardere
Feuerraum focar
Feuerwärmeleistung putere calorică de ardere
Flotatfett grăsime de flotaţie
fossile Energiequelle sursă fosilă de energie
Frischdampf abur proaspăt
Gärrestlager depozit pentru resturile de fermentare
gasdicht etanş
Gasertrag debit de gaz, producţie de gaz
Gaslager depozit de acumulare a gazelor
Getreidespelzen pleavă de cereale
Großflügelrührwerk malaxor cu pale de mari dimensiuni
Grünschnittpellets pelete din tăierea plantelor
Gülle must de bălegar
Halogenverbindung compus cu halogen
Hausmüll gunoi menajer
Heizöl combustibil lichid de încãlzire
Heizöl păcură
Hochdruckdampfkessel cazan cu aburi de înaltă presiune
Holzbriketts brichete de lemn
Inputstoff material de input
Kesselhaus sala cazanelor
Kesselwasser apă din cazan
Klärschlamm nămol de la staţia de epurare
Kleinfeuerungsanlagen instalaţii de ardere de mici dimensiuni
klimaschädlich dăunător climei
Kohlekraftwerk centrală termică pe cărbuni
Kohlendioxid dioxid de carbon
Kondensat condens
konventionelles Kraftwerk centrală termică convenţională
Konvertierung convertire
Kraft-Wärme-Koppelungseinheiten unităţi de cogenerare
Kraft-Wärme-Kopplung cuplare curent-căldură
Kraft-Wärme-Kopplung / KWK cogenerare
Landschaftspflege amenajarea peisajului
Luftkondensator condensator de aer
Maissilage însilozarea porumbului
Maschinenhaus sala maşinilor
Methan metan
mikrobiell microbian
Mist gunoi de grajd
Nachgärer fermenter secundar
Nachgärung fermentare ulterioară
Nachwachsende Rohstoffe resurse primare regenerabile
naturbelassenen Biomassen biomase naturale
NaWaRo resurse primare regenerabile
Nebenprodukt produs auxiliar
nicht eisenhaltig neferos
Nutzvolumen volum util
Olivenkern sâmbure de măsline
organische Reststoffe resturi organice
Pelletheizung încălzire cu peleţi
Pelletierung peletizare
Preisspirale spirala preţurilor
Privathaushalt gospodărie privată
Produktionsablauf desfăşurarea producţiei
Produktionsstandort locaţie de producţie
Prozesoptimierung optimizarea producţiei
Prozessdamf abur de proces
Prozesswasser apă de proces
Pumpsystem sistem de pompare
Rauchgas gaz de ardere
Reaktionsturm turn de reacţie
refuse-derived fuel RDF
regenerativ regenerativ
Restholz resturi de material lemnos
Rindenbriketts birichete din coajă de lemn
Rohfaser fibră brută
Rohrschlange serpentină din ţeavă
Rost grilaj
Rostschlacke zgură
sauerstofffrei anaerob
Saugzug tiraj artificial
Saugzug tiraj forţat
Schadstoff substanţă dăunătoară
Schadstoffe noxe
Schallschutzcontainer container cu protecţie fonică
Schlackebunker buncăr pentru zgură
Schlauchfilter filtru cu furtun
Schwefeloxid oxid de sulf
Schwermetall metal greu
Schwermetalle metale grele
Sekundärluft aer secundar
Silage însilozare
Silo siloz
Speiseabfälle resturi alimentare
Speisereste resturi alimentare
Speisewasser apă de alimentare
Spurengas gaz remanent
Stickgas azot
Stickoxid oxid de azot
Stickstoff azot
Stößel bătător
Stößel berbec de sonetă
Stößel tachet
Stößel cap de mortezat
Stößel pisălog
Strauchschnitt tăierea arbuştilor
Strombedarf necesar de curent
substituieren substitui
Substrat substrat
Symbiose simbioză
Tauchmotor motor submersibil
Treibhausgas gaz de seră
Turbine turbină
Turbogeneratorsatz unitate turbogeneratoare
unter Luftabschluss fără aer
Verbrennungsprozess proces de ardere
Vergärung fermentare
Versorger companie de alimentare cu energie
Verweildauer durată de menţinere
Verwertung valorificare
Vorgrube rezervor preliminar
Waldrestholz resturi de lemn din pădure
Wärmetäuscher schimbător de căldură
Wasserbad baie de apă
Wertschöpfungskette lanţ de valoare adăugată
Wirtschaftsdünger îngrăşăminte economice
Wirtschaftsdünger fertlizant industrial
zersetzen descompune
Zufeuerung zu fossilen Brennstoffen co-firing / combustie combinată cu combustibili fosili
Zersetzung descompunere

Monday, August 18, 2025

ISO 639 Language Codes

ISO Language Names Set 1 Set 2 Set 3 Endonym(s) Other Name(s)
[note 1]
Notes
T B
Abkhazian ab abk abk Аҧсуа; Apsua; აფსუა Abkhaz
Afar aa aar aar Qafar af

Afrikaans af afr afr Afrikaans

Akan ak aka aka + 2 Ákán
Twi is tw/twi, Fanti is fat
Albanian sq sqi alb sqi + 4 Shqip called "Albanian Phylozone" in 639-6
Amharic am amh amh አማርኛ (Amarəñña)

Arabic ar ara ara + 28 اَلْعَرَبِيَّةُ
(al-ʿarabiyyah)

Standard Arabic is arb
Aragonese an arg arg Aragonés

Armenian hy hye arm hye Հայերեն (Hayeren)
ISO 639-3 code hye is for Eastern Armenian, hyw is for Western Armenian, and xcl is for Classical Armenian
Assamese as asm asm অসমীয়া (Ôxômiya) Asamiya
Avaric av ava ava Авар мацӏ; اوار ماض (Avar maz) Avar
Avestan ae ave ave Upastawakaēna

Aymara ay aym aym + 2 Aymara Aymaran
Azerbaijani az aze aze + 2 Azərbaycan dili; آذربایجان دیلی; Азәрбајҹан дили Azeri
Bambara bm bam bam بَمَنَنكَن ;ߓߡߊߣߊ߲ߞߊ߲ (Bamanankan) Bamana; Bamanankan
Bashkir ba bak bak Башҡорт теле; Başqort tele Bashkort
Basque eu eus baq eus Euskara/Euskera

Belarusian be bel bel Беларуская мова (Biełaruskaja mova)

Bengali bn ben ben বাংলা (Bāŋlā) Bangla
Bislama bi bis bis Bislama
Language formed from English and Vanuatuan languages, with some French influence.
Bosnian bs bos bos Босански (Bosanski) Bosniak Member language of Serbo-Croatian with code sh deprecated in 2000
Breton br bre bre Brezhoneg

Bulgarian bg bul bul Български (Bulgarski)

Burmese my mya bur mya မြန်မာစာ (Mrãmācā) Myanmar
Catalan, Valencian ca cat cat Català; Valencià

Central Khmer km khm khm ខេមរភាសា; (Khémôrôphéasa) Khmer; Cambodian
Chamorro ch cha cha Finu' Chamoru

Chechen ce che che Нохчийн мотт;
(Noxçiyn mott)
Chechnyan; Chechnian
Chichewa, Chewa, Nyanja ny nya nya Chichewa; Chinyanja

Chinese zh zho chi zho + 19 中文 (Zhōngwén)
汉语; 漢語 (Hànyǔ)


Church Slavonic, Old Slavonic, Old Church Slavonic cu chu chu Славе́нскїй ѧ҆зы́къ
In use by the Eastern Orthodox Church
Chuvash cv chv chv Чӑвашла (Çăvaşla)

Cornish kw cor cor Kernowek

Corsican co cos cos Corsu

Cree cr cre cre + 6 ᓀᐦᐃᔭᐁᐧᐃᐧᐣ (Nehiyawewin)

Croatian hr hrv hrv Hrvatski Crovatian Member language of Serbo-Croatian with code sh deprecated in 2000
Czech cs ces cze ces Čeština Czechian
Danish da dan dan Dansk

Divehi, Dhivehi, Maldivian dv div div ދިވެހި (Dhivehi)

Dutch, Flemish nl nld dut nld Nederlands
Flemish is not to be confused with the closely related West Flemish which is referred to as "Vlaams" and has the code vls in ISO 639-3
Dzongkha dz dzo dzo རྫོང་ཁ་ (Dzongkha) Bhutanese
English en eng eng English

Esperanto eo epo epo Esperanto

Estonian et est est + 2 Eesti keel

Ewe ee ewe ewe Èʋegbe

Faroese fo fao fao Føroyskt Faeroese
Fijian fj fij fij Na Vosa Vakaviti

Finnish fi fin fin Suomi

French fr fra fre fra Français

Fulah ff ful ful + 9 𞤊𞤵𞤤𞤬𞤵𞤤𞤣𞤫 ;ࢻُلْࢻُلْدٜ; Fulfulde
𞤆𞤵𞤤𞤢𞥄𞤪 ;ݒُلَارْ; Pulaar
Fula; Fulani
Gaelic, Scottish Gaelic gd gla gla Gàidhlig Scots Gaelic
Galician gl glg glg Galego Galego
Ganda lg lug lug Luganda Luganda
Georgian ka kat geo kat ქართული (Kharthuli)

German de deu ger deu Deutsch

Greek, Modern (1453–) el ell gre ell Νέα Ελληνικά; (Néa Ellêniká)
for Ancient Greek, use the ISO 639-3 code grc
Guarani gn grn grn + 5 Avañe'ẽ

Gujarati gu guj guj ગુજરાતી (Gujarātī)

Haitian, Haitian Creole ht hat hat Kreyòl ayisyen

Hausa ha hau hau هَرْشٜن هَوْس (halshen Hausa) Hausan
Hebrew he heb heb עברית‎ (Ivrit)
Modern Hebrew. Code changed in 1989 from original ISO 639:1988, iw.[3]
Herero hz her her Otjiherero Otjiherero
Hindi hi hin hin हिन्दी (Hindī)

Hiri Motu ho hmo hmo Hiri Motu Police Motu; Pidgin Motu
Hungarian hu hun hun Magyar nyelv Magyar
Icelandic is isl ice isl Íslenska

Ido io ido ido Ido

Igbo ig ibo ibo ásụ̀sụ́ Ìgbò

Indonesian id ind ind bahasa Indonesia
covered by macrolanguage ms/msa. Changed in 1989 from original ISO 639:1988, in.[3]
Interlingua (International Auxiliary Language Association) ia ina ina Interlingua

Interlingue, Occidental ie ile ile Interlingue; Occidental

Inuktitut iu iku iku + 2 ᐃᓄᒃᑎᑐᑦ (Inuktitut)

Inupiaq ik ipk ipk + 2 Iñupiaq Inupiat; Inupiatun
Irish ga gle gle Gaeilge Irish Gaelic
Italian it ita ita Italiano

Japanese ja jpn jpn 日本語 (Nihongo)

Javanese jv jav jav ꦧꦱꦗꦮ; basa Jawa

Kalaallisut, Greenlandic kl kal kal Kalaallisut

Kannada kn kan kan ಕನ್ನಡ (Kannađa) Kannadan; Canarese
Kanuri kr kau kau + 3 كَنُرِيِه; Kànùrí

Kashmiri ks kas kas कॉशुर; كأشُر (Kosher) Koshur
Kazakh kk kaz kaz Қазақша; Qazaqşa; قازاقشا Qazaq
Kikuyu, Gikuyu ki kik kik Gĩgĩkũyũ

Kinyarwanda rw kin kin Ikinyarwanda Rwandan; Rwanda; Ikinyarwanda
Komi kv kom kom + 2 Коми кыв Zyran; Zyrian; Komi-Zyryan
Kongo kg kon kon + 3 Kikongo Kikongo
Korean ko kor kor 한국어 (Hangugeo)
조선말 (Chosŏnmal)


Kuanyama, Kwanyama kj kua kua Oshikwanyama Cuanhama; Oshikwanyama
Kurdish ku kur kur + 3 کوردی; Kurdî

Kyrgyz, Kirghiz ky kir kir Кыргыз;
قىرعىز


Lao lo lao lao ພາສາລາວ (phasa Lao) Laotian
Latin la lat lat Latinum
In use by several Christian organization of churches, and for sciences
Latvian lv lav lav + 2 Latviski Lettish
Limburgan, Limburger, Limburgish li lim lim Lèmburgs

Lingala ln lin lin Lingála Ngala
Lithuanian lt lit lit Lietuvių

Luba-Katanga lu lub lub Kiluba Luba-Shaba
Luxembourgish, Letzeburgesch lb ltz ltz Lëtzebuergesch Luxembourgian
Macedonian mk mkd mac mkd Македонски (Makedonski)

Malagasy mg mlg mlg + 11 مَلَغَسِ; Malagasy

Malay ms msa may msa + 36 بهاس ملايو (bahasa Melayu)
Standard Malay is zsm, Indonesian is id/ind
Malayalam ml mal mal മലയാളം (Malayāļã)

Maltese mt mlt mlt Malti

Manx gv glv glv Gaelg; Gailck Manx Gaelic
Maori mi mri mao mri reo Māori

Marathi mr mar mar मराठी (Marāṭhī) Maharashtran
Marshallese mh mah mah kajin M̧ajel‌̧ Ebon
Mongolian mn mon mon + 2 ᠮᠣᠩᠭᠣᠯ
ᠬᠡᠯᠡ; Монгол хэл (Mongol xel)
Mongol
Nauru na nau nau dorerin Naoe Nauruan
Navajo, Navaho nv nav nav Diné bizaad; Naabeehó bizaad

Ndonga ng ndo ndo Ndonga Oshindonga
Nepali ne nep nep + 2 नेपाली भाषा (Nepālī bhāśā) Nepalese; Gorkhali
North Ndebele nd nde nde isiNdebele; saseNyakatho; Mthwakazi Ndebele Northern Ndebele
Northern Sami se sme sme Davvisámegiella North Sami
Norwegian no nor nor + 2 Norsk
Bokmål is nb/nob, Nynorsk is nn/nno
Norwegian Bokmål nb nob nob Norsk Bokmål
covered by macrolanguage no/nor
Norwegian Nynorsk nn nno nno Norsk Nynorsk
covered by macrolanguage no/nor
Occitan oc oci oci Occitan; Provençal Provential; Provencal
Ojibwa oj oji oji + 7 ᐊᓂᔑᓈᐯᒧᐎᓐ (Anishinaabemowin) Ojibwe; Ojibway; Otchipwe; Ojibwemowin
Oriya or ori ori + 2 ଓଡ଼ିଆ (Odia) Odian; Odishan; Orissan
Oromo om orm orm + 4 afaan Oromoo Oromoo
Ossetian, Ossetic os oss oss ирон Ӕвзаг
(iron Ævzag)
Ossete
Pali pi pli pli Pāli Pali-Magadhi
Pashto, Pushto ps pus pus + 3 پښتو (Pax̌tow)

Persian fa fas per fas + 2 فارسی (Fārsiy) Farsi
Polish pl pol pol Polski

Portuguese pt por por Português

Punjabi, Panjabi pa pan pan ਪੰਜਾਬੀ; پنجابی (Pãjābī)

Quechua qu que que + 43 Runa simi; kichwa simi; Nuna shimi Quechuan
Romanian, Moldavian, Moldovan ro ron rum ron Română; Ромынэ
the identifiers mo and mol for Moldavian are deprecated. They will not be assigned to different items, and recordings using these identifiers will not be invalid.
Romansh rm roh roh Rumantsch; Rumàntsch; Romauntsch; Romontsch Romansch
Rundi rn run run Ikirundi Kirundi
Russian ru rus rus Русский язык (Russkiĭ âzyk)

Samoan sm smo smo gagana Sāmoa

Sango sg sag sag yângâ tî Sängö Sangoic
Sanskrit sa san san + 2 संस्कृतम् (Saṃskṛtam)
In use by some Indian states on judicial purposes
Sardinian sc srd srd + 4 Sardu Sard
Serbian sr srp srp Српски (Srpski)
Member language of Serbo-Croatian with code sh deprecated in 2000, the ISO 639-2/T code srp deprecated the ISO 639-2/B code scc[4]
Shona sn sna sna chiShona

Sichuan Yi, Nuosu ii iii iii ꆈꌠꉙ (Nuosuhxop) Northern Yi; Liangshan Yi; Nosu standard form of the Yi languages
Sindhi sd snd snd سنڌي; सिन्धी (Sindhī)

Sinhala, Sinhalese si sin sin සිංහල (Siṁhala)

Slovak sk slk slo slk Slovenčina Slovakian
Slovenian sl slv slv Slovenščina Slovene
Somali so som som Soomaali; 𐒈𐒝𐒑𐒛𐒐𐒘; سٝومالِ Somalian
South Ndebele nr nbl nbl isiNdebele; sakwaNdzundza Southern Ndebele
Southern Sotho st sot sot Sesotho Sesotho; Sotho
Spanish, Castilian es spa spa Español; Castellano

Sundanese su sun sun basa Sunda; ᮘᮞ ᮞᮥᮔ᮪ᮓ; بَاسَا سُوْندَا

Swahili sw swa swa + 2 Kiswahili; كِسوَحِيلِ Kiswahili
Swati ss ssw ssw siSwati Swazi
Swedish sv swe swe Svenska

Tagalog tl tgl tgl Wikang Tagalog
note: Filipino (Pilipino) has the code fil
Tahitian ty tah tah reo Tahiti
One of the Reo Mā`ohi (languages of French Polynesia)[5]
Tajik tg tgk tgk Тоҷикӣ (Tojikī) Tajiki
Tamil ta tam tam தமிழ் (Tamiḻ) Thamizh
Tatar tt tat tat Татар теле;
Tatar tele; تاتار تئلئ‎


Telugu te tel tel తెలుగు (Telugu)

Thai th tha tha ภาษาไทย (Phasa Thai) Central Thai; Siamese
Tibetan bo bod tib bod བོད་སྐད་ (Bodskad);
ལྷ་སའི་སྐད་ (Lhas'iskad)
Standard Tibetan; Lhasa Tibetan
Tigrinya ti tir tir ትግርኛ (Təgrəñña) Tigrigna
Tonga (Tonga Islands) to ton ton lea faka-Tonga Tongan
Tsonga ts tso tso Xitsonga Xitsonga
Tswana tn tsn tsn Setswana Setswana; Sechuana
Turkish tr tur tur Türkçe

Turkmen tk tuk tuk Türkmençe;
Түркменче; تۆرکمنچه


Twi tw twi twi Twi
covered by macrolanguage ak/aka
Uighur, Uyghur ug uig uig ئۇيغۇر تىلى;
Уйғур тили; Uyƣur tili


Ukrainian uk ukr ukr Українська (Ukraїnska)

Urdu ur urd urd اُردُو (Urduw)

Uzbek uz uzb uzb + 2 Ózbekça;
ўзбекча; ئوزبېچه


Venda ve ven ven Tshivenḓa Tshivenda
Vietnamese vi vie vie tiếng Việt

Volapük vo vol vol Volapük

Walloon wa wln wln Walon

Welsh cy cym wel cym Cymraeg

Western Frisian fy fry fry Frysk West Frisian; Frisian;
Fries

Wolof wo wol wol وࣷلࣷفْ

Xhosa xh xho xho isiXhosa Xosa
Yiddish yi yid yid + 2 ייִדיש (Yidiš) Judeo-German Changed in 1989 from original ISO 639:1988, ji.[3]
Yoruba yo yor yor èdè Yorùbá

Zhuang, Chuang za zha zha + 16 話僮 (Vahcuengh)

Zulu zu zul zul isiZulu