Tuesday, March 6, 2012

Translating WordPress

Internationalization and localization are terms used to describe the effort to make WordPress (and other such projects) available in languages other than English, for people from different locales, who use different dialects and local preferences.
The process of localizing a program has two steps. The first step is when the program's developers provide a mechanism and method for the eventual translation of the program and its interface to suit local preferences and languages for users worldwide. WordPress developers have done this, so in theory, WordPress can be used in any language.
The second step is the actual localization, the process by which the text on the page and other settings are translated and adapted to another language and culture, using the framework prescribed by the developers of the software. WordPress has already been localized into many other languages (see WordPress in Your Language for more information).
This article explains how translators (bi- or multi-lingual WordPress users) can go about localizing WordPress to more languages.

Translating WordPress

Before you start translating WordPress, check WordPress in Your Language (and resources cited there) to see if a translation of WordPress into your language already exists. It is also possible that someone (or a team) is already working on translating WordPress into your language, but they haven't finished yet. To find out, subscribe to the polyglots' blog, introduce yourself, and ask if there's anyone translating into your language. There is also a list of localization teams and localization teams currently forming, which you can check to see if a translation is in progress.

Qualifications

Assuming that a WordPress translation into your language does not already exist or has someone working on it, you may want to volunteer to create a public translation of WordPress into your language. If so, here are the qualifications you will need:
  • You need to be truly bilingual -- fluent in both written English and the language(s) you will be translating into. Casual knowledge of either one will make translating difficult for you, or make the localization you create confusing to native speakers.
  • You need to be familiar with PHP, as you will sometimes need to read through the WordPress code to figure out the best way to translate messages.
  • You should be familiar with human language constructs: nouns, verbs, articles, etc., different types of each, and be able to identify variations of their contexts in English.

About Locales

A locale is a combination of language and regional dialect. Usually locales correspond to countries, as is the case with Portuguese (Portugal) and Portuguese (Brazil).
You can do a translation for any locale you wish, even other English locales such as Canadian English or Australian English, to adjust for regional spelling and idioms.
The default locale of WordPress is U.S. English.

Localization Technology

WordPress's developers chose to use the GNU gettext localization framework to provide localization infrastructure to WordPress. gettext is a mature, widely used framework for modular translation of software, and is the de facto standard for localization in the open source/free software realm.
gettext uses message-level translation — that is, every "message" displayed to users is translated individually, whether it be a paragraph or a single word. In WordPress, such "messages" are generated, translated, and used by the WordPress PHP files via two PHP functions. __() is used when the message is passed as an argument to another function; _e() is used to write the message directly to the page. More detail on these two functions:
__('message') 
Searches the localization module for the translation of 'message', and passes the translation to the PHP return statement. If no translation is found for 'message', it just returns 'message'.
_e('message') 
Searches the localization module for the translation of 'message', and passes the translation to the PHP echo statement. If no translation is found for 'message', it just echoes 'message'.
Note that if you are internationalizing a Theme or Plugin, you should use a "Text Domain". See Writing a Plugin for more information on how to do this for a plugin; themes are similar.
The gettext framework takes care of most of WordPress. However, there are a few places in the WordPress distribution where gettext cannot be used -- see Files For Direct Translation for more information on how to translate these spots.

gettext files

There are three types of files used in the gettext translation framework. These files are used and/or generated by translation tools during the translation process, as follows:
POT (Portable Object Template) files 
The first step in the localization process is that a program is used to search through the WordPress source code and pick out every message passed into a __() or _e() function. This list of English-language messages is put into a specially-formatted template file (POT file) that forms the basis of all translations. Generally, you can download a POT file for WordPress, so you shouldn't have to generate your own. Separate POT files can also be made for themes and plugins, if the theme/plugin developer has enclosed all text in __() or _e() functions.
PO (Portable Object) files 
The second step in the localization process is that the translator translates all the messages from the POT file into the target language, and saves both English and translated messages in a PO file.
MO (Machine Object) files 
The final step in the localization process is that the PO file is run through a program that turns it into an optimized machine-readable binary file (MO file). Compiling the translations to machine code makes the localized program much faster in retrieving the translations while it is running.

Translation Tools

There are various tools available to aid in translating. You may use whichever you prefer.
GlotPress 
GlotPress will let you, or an entire team, translate your favourite software. It is web-based and open-source. It is also completely in sync with the main repositories and the preferred method for translating WordPress into your language.
Launchpad 
The Ubuntu Linux project has a web site that allows you to translate messages without even looking at a PO or POT file, and export directly to a MO.
Note: many translators have found Rosetta to be a good starting point, but once it comes time to proofread the entire list of translations, many have opted to switch hand-editing the PO file or using a program like Poedit or KBabel, since the Rosetta UI lacks a search feature and other things that become essential when proofreading and editing.
Pootle 
An open source web-based translation system. The server hosted at Locamotion.org currently has WordPress translation enabled on it.
Poedit 
An open source program for Windows, Mac OS X and UNIX/Linux which provides an easy-to-use GUI for editing PO files and generate MO files.
KBabel 
Another open source PO editing program for the KDE window manager on Linux.
GNU Gettext 
The official Gettext tools package contains command-line tools for creating POTs, manipulating POs, and generating MOs. For those comfortable with a command shell.

Translating With GlotPress

There is a page with instructions on how to translate with GlotPress, on the Getting Started guide.. If you don't see your language listed, please request its inclusion on the WP Polyglots blog.

Translating With Launchpad

We have a separate page with instructions for translating WordPress with Launchpad.

Translating With Pootle (at Locamotion.org)

  1. Register an account at the Pootle server, and send an e-mail to one of the admins to add your language
  2. Before trying to translate anything, remember to log in to Pootle. Content can sometimes be viewed and suggestions can sometimes be entered even if a visitor is not logged in, but one can only translate if logged in.
  3. Visit the WordPress page for your language. For example, the Afrikaans page is at pootle.locamotion.org/af/wordpress/ (remember the trailing slash).
  4. Click "Show Editing Functions".
  5. Click "Quick Translate" to edit only untranslated and fuzzy strings, or click "Translate All" to edit all strings.
For the purpose of translating WordPress at locamotion.org, the single wordpress.pot file has been split up into smaller logical units. The readme.html file is also available there, and so is a file containing all the strings that one would normally add to the PHP files manually.
Also take a look at the Decathlon wiki page for WordPress, here and here.
Merging your translations into wordpress.pot
Normally, with a Pootle server, the translator can download his chosen software's PO file at any time and submit it to his project. However, because the original source file has been split into smaller units at pootle.locamotion.org, translators must manually merge their translations back into the wordpress.pot file before submitting it to WordPress.
  1. Download the official WordPress POT file.
  2. Download the WordPress Continent POT file (Optional)
  3. Download and install the Translate Toolkit on your computer.
  4. Download the translated or partially translated PO files from the Pootle server. You can download them one by one or you can download them in a ZIP file (see options on the web site). Normally you don't need to be logged in at Poolte to download the PO files that translators in your language have translated.
  5. First, combine the PO files into a single translation memory (because it is easier to do subsequent steps with a single file than with several files), and execute the following from the command line: po2tmx -l xx -i pofiles -o xx.tmx where "xx" is your target language code. This will create a TMX translation memory file called xx.tmx.
  6. Second, pre-translate the WordPress POT file using the translation memory. To do this, execute the following from the command line: pot2po --tm=xx.tmx -i wordpress.pot -o wordpress_xx.po. This will create a PO file for your language called wordpress_xx.po.
  7. Lastly, do a word/string count on your PO file to see how much of it is translated, fuzzy and untranslated, using the following from the command line: pocount wordpress_xx.po
If all the PO files were 100% translated, the final wordpress_xx.po file will also be 100% translated. If some strings were not translated in the PO files, the pot2po command might cause some fuzzy translations in wordpress_xx.po (this is not a bad thing).

Translating With Poedit

  1. Download and install Poedit
  2. Download the official WordPress POT file
    The Poedit screen
  3. Open the file in Poedit.
  4. (See Image) The box labeled (1) is the original message (in English) from the POT file. The box labeled (2) is where you add your translation. Boxes labeled (3) and (4) are used for adding comments about the messages. These come in handy if you are working with a team of translators and would like to pass around ideas through the PO file.
  5. Go to File → Save as… to save your translations in a PO file.
  6. When you are finished translating, go to File → Save as… again to generate the MO file.
  7. Or you can set your Poedit to always compile a MO file when saving changes by clicking File → Preferences and on the Editor tab check the Automatically compile .mo file on save box.

Translating With KBabel

This section is incomplete.
  1. Download the official WordPress POT file
  2. Open the file in KBabel

Translating With Gettext Tools

  1. Download the official WordPress POT file
  2. Open the file in your favorite text editor
  3. Update the header information
  4. Translate the messages
  5. Save the file with a .po file extension
  6. Issue msgfmt -o filename.mo filename.po

The PO File Header

At the beginning of the PO file is something called the header. This gives information about what package and version the translation is for, who the translator was, and when it was created. Certain portions of this header should be universal for all WordPress translations:
# LANGUAGE (LOCALE) translation for WordPress.
# Copyright (C) YEAR WordPress contributors.
# This file is distributed under the same license as the WordPress package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: WordPress VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2005-02-27 17:11-0600\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"
Fill in the rest of the capitalized text with the appropriate values.

Message Format

The remainder of the file will be in a format as follows:
#: wp-comments-post.php:13
msgid "Sorry, comments are closed for this item."
msgstr ""

#: wp-comments-post.php:29
msgid "Sorry, you must be logged in to post a comment."
msgstr ""

#: wp-comments-post.php:35
msgid "Error: please fill the required fields (name, email)."
msgstr ""
The first line of each message contains the location of the message in the WordPress code. In the case of these messages, they're all located in wp-comments-post.php, on lines 13, 29, and 35, respectively. Occasionally you will come across a message for which you will need to check its context; look at the appropriate line or lines in the WordPress core, and you should be able to figure out when and where the message is displayed, and even reproduce it yourself using your web browser. Some messages will also appear with the same text in multiple locations; in that case, there may be more than one line giving a file and line location.
The next line, msgid, is the source message. This is the string that WordPress passes to its __() or _e() functions, and the message you will need to translate.
The final line, msgstr, is a blank string where you will fill in your translation.
Here's how the same few lines would look after being translated, using the French (France) locale as an example:
#: wp-comments-post.php:13
msgid "Sorry, comments are closed for this item."
msgstr "L'ajout de commentaire n'est pas ou plus possible pour cet article."

#: wp-comments-post.php:29
msgid "Sorry, you must be logged in to post a comment."
msgstr "Vous devez être connecté pour rédiger un commentaire."

#: wp-comments-post.php:35
msgid "Error: please fill the required fields (name, email)."
msgstr "Erreur : veuillez remplir les champs obligatoires vides (nom, e-mail)."
Note: see Character encodings and HTML character entities below for notes on when to use HTML character entities in translation.

Types of messages

Labels

Labels are often used in the context of HTML <label>, <legend>, <a>, or <select> tags. They are short and precise descriptors of the purpose of a UI element. These can be very difficult to translate at times, especially if they are single words, and if the word used in English can be interpreted as either a noun or imperative verb. With most labels you will need to do some searching through the code to find the context of its use before coming up with an appropriate translation.
Because so many of the messages are part of the WordPress administration interface, Labels are probably the most frequent type of message to translate.

Examples

msgid "Post"
msgstr "Artikkeli"
"Post" could be interpreted as an imperative verb, but in this context it's a noun. The noun form of "post" in English can be difficult to translate, and the most appropriate translation has been difficult for some teams to decide upon. Many translations use their language's equivalent to the English "Article," as this one does. (From the Finnish (Finland) translation.)
#: wp-login.php:79 wp-login.php:233 wp-register.php:166
#: wp-includes/template-functions-general.php:46
msgid "Register"
msgstr "रजिस्टर"
From the Hindi translation.
#: wp-admin/admin-functions.php:357
msgid "- Select -"
msgstr " - Dewis -"
Items like the surrounding dashes in this example can be eliminated or replaced if they might be confusing to users in your target locale, or if there are different established conventions for your locale. From the Welsh translation.

Informational Messages

Another frequent type of message, the informational message is usually composed of full sentences, and conveys information or requests an action of the user. Since these tend to be longer than labels, they tend to be slightly easier to translate. However, with the longer messages comes more variation in the level of formality (or informality), which is something translators need to be aware of.

Examples

#: wp-login.php:146
msgid "Your new password is in the mail."
msgstr "Вашата нова парола е в електронната ви поща."
This particular message contains a modified English formulaic expression ("the check/cheque is in the mail"), which contributes to its informality. (From the Bulgarian (Bulgaria) translation.)
#: wp-includes/functions.php:1636
msgid "<strong>Error</strong>: Incorrect password."
msgstr "<strong>FEL</strong>: Felaktigt lösenord."
Error messages tend to be more formal, simply because they're short and concise. (From the Swedish (Sweden) translation.)
#: wp-includes/functions-post.php:467
msgid "Sorry, you can only post a new comment once every 15 seconds. Slow down cowboy."
msgstr "Leider kannst du nur alle 15 Sekunden einen neuen Kommentar eingeben. Immer locker bleiben."
Of course, not all of them. (From the German (Germany) translation.)

Strings with description

If a string contains a vertical bar |, the part on the right of | is a description. Its purpose is to help you translate the string, placing it in certain context or providing additional information.

Examples

#: wp-includes/locale.php:186
msgid ""
"number_format_decimal_point|$dec_point argument for http://php.net/number_format, default is ."
msgstr ","
The description suggest you look at a web page, in order to translate the string.

Date and Time Locale Settings

Rather than using PHP's built-in locale switching features, which is not configured for very many languages on most hosts, WordPress uses the gettext translation module to accomplish date and time translations and formatting.
WordPress translates the following:

Month names

#: wp-includes/locale.php:42 wp-includes/locale.php:57
msgid "May"
msgstr "Květen"
(From the Czech (Czech Republic) translation.)

Month abbreviations

#: wp-includes/locale.php:57
msgid "May_May_abbreviation"
msgstr "Mag"
Note the unusual msgid. These messages should NOT be translated literally: they are a hack to get around the fact that in English, the full name and abbreviation for May are the same, which Gettext would erroneously combine into one entry. (From the Italian (Italy) translation.)

Weekday Names

#: wp-includes/locale.php:7
#: wp-includes/locale.php:18
#: wp-includes/locale.php:31
msgid "Tuesday"
msgstr "火曜日"
(From the Japanese (Japan) translation.)

Weekday Abbreviations

#: wp-includes/locale.php:31
msgid "Tue"
msgstr "Уто"
(From the Serbian (Serbia) translation.)

Weekday Initials

#: wp-includes/locale.php:18
msgid "T_Tuesday_initial"
msgstr "ti"
The weekday initials are for WordPress's calendar feature, and use the same hack as the month abbreviations to get around the fact that in English Tuesday and Thursday share the same first letter. Not all locales use single-letter abbreviations for all days: in this example, Norwegian Bokmål uses an extra letter to distinguish tirsdag (Tuesday) and torsdag (Thursday). (From the Norwegian Bokmål (Norway) translation.)

Date Formatting Strings

These are PHP date() formatting strings, and they allow you to change the formatting of the date and time for your locale.
WordPress uses the translations elsewhere in the localization file for month names, weekday names, etc. This special string is for the selection of which elements to include in the date & time, as well as the order in which they're presented.
Take this msgid from the theme.pot file:
#: archive.php:40 search.php:19 single.php:22
msgid "l, F jS, Y"
msgstr ""
In English, this gets formatted as:
Sunday, February 27th, 2005
However, different locales format their dates differently. In Danish, for example, dates are written:
søndag, 27. februar 2005
To accomplish this, the msgid above would be translated to:
#: archive.php:40 search.php:19 single.php:22
msgid "l, F jS, Y"
msgstr "l, j. F Y"
To use another example, one way to format dates in Chinese and Japanese is as follows:
2005年2月27日
This would be accomplished in the translation like this:
#: archive.php:40 search.php:19 single.php:22
msgid "l, F jS, Y"
msgstr "Y年n月j日"
Lastly, if you need to include literal alphabetic characters in your date format, as sometimes occurs in Spanish, you can backslash them:
#: archive.php:40 search.php:19 single.php:22
msgid "l, F jS, Y"
msgstr "l j \d\e F \d\e Y "
This would output:
domingo 27 de febrero de 2005

Translation via WordPress-PHP

To translate a date, e.g. inside your plugin, use mysql2date() or date_i18n(). Your date will be returned in localized format, based on the timestamp.

Messages With Placeholders

Many messages contain special PHP formatting placeholders, which allow the insertion of untranslatable dynamic content into the message after it is translated. The PHP placeholders come in two different formats:
%s 
When only one placeholder is present, this marker is used.
%1$s, %2$s, %3$s, … 
Numbered placeholders, which allow translations to rearrange the order of the placeholders in the string while maintaining the information each is replaced with.

Examples

#: wp-login.php:116
msgid "The e-mail was sent successfully to %s's e-mail address."
msgstr "El e-mail fue enviado satisfactoriamente a la dirección e-mail de %s"
This message inserts the username of the user to which an email has been sent. (From the Spanish (Spain) translation.)
#: wp-admin/upload.php:96
#, php-format
msgid "File %1$s of type %2$s is not allowed."
msgstr "类型为%2$s的文件%1$s不允许被上传。"
This message reverses the order in which the file name and type are used in the translation. (From the Chinese (China) translation.)

Tips for Good Translations

Don't translate literally, translate organically 
Being bi- or multi-lingual, you undoubtedly know that the languages you speak have different structures, rhythms, tones, and inflections. Translated messages don't need to be structured the same way as the English ones: take the ideas that are presented and come up with a message that expresses the same thing in a natural way for the target language. It's the difference between creating an equal message and an equivalent message: don't replicate, replace. Even with more structural items in messages, you have creative license to adapt and change if you feel it will be more logical for, or better adapted to, your target audience.
Try to keep the same level of formality (or informality) 
Each message has a different level of formality or informality. Exactly what level of formality or informality to use for each message in your target language is something you'll have to figure out on your own (or with your team), but WordPress messages (informational messages in particular) tend to have a politely informal tone in English. Try to accomplish the equivalent in the target language, within your cultural context.
Don't use slang or audience-specific terms 
Some amount of terminology is to be expected in a blog, but refrain from using colloquialisms that only the "in" crowd will get. If the uninitiated blogger were to install WordPress in your language, would they know what the term means? Words like pingback, trackback, and feed are exceptions to this rule; they're terminology that are typically difficult to translate, and many translators choose to leave in English.
Read other software's localizations in your language 
If you get stuck or need direction, try reading through the translations of other popular software tools to get a feel for what terms are commonly used, how formality is addressed, etc. Of course, WordPress has its own tone and feel, so keep that in mind when you're reading other localizations, but feel free to dig up UI terms and the like to maintain consistency with other software in your language.

WordPress Localization Repository

The WordPress Localization Repository at http://i18n.svn.wordpress.org/ is a Subversion repository where official WordPress translations are maintained. Various teams collaborate on translations for their native language, and team maintainers commit updates and changes to the repository.

Participating

Participation in the repository is open to anyone. Simply visit the WP Polyglots Blog, introduce yourself, and let everyone know what translation you'd like to work on. If there is already a team for your language and locale, they'll let you know and you can join them. If not, you can either volunteer to be a maintainer for your language and locale, or simply contribute your localization and the repository maintainers will add it.

Guidelines and requirements

Note: these guidelines are subject to change as the system evolves; repository maintainers will be happy to assist you in updating the files you maintain in the repository should these guidelines change.

Character Encodings

All localizations should have at least a UTF-8 version, but may optionally add versions in other character encodings popular for that locale.
Current PHP versions don't support Byte Order Markers (BOMs), so be sure the UTF-8 encoded files you contribute do not have them.

HTML Character Entities

With a few exceptions (noted below), all translations should be written literally, rather than escaping accented and special characters with HTML character entities.
Some characters must be escaped to avoid conflict with XHTML markup: angle brackets (&lt; and &gt;), and ampersands (&amp;). In addition, there are a few other characters better used escaped, such as non-breaking spaces (&nbsp;), angle quotes (&laquo; and &raquo;), curly apostrophes (&#8217;) and curly quotes.
For more information about the W3C's best practices involving character encodings and character entities, see the following references:

Repository File Structure

The repository contains directories for each locale, which are named as follows:
Within each locale's directory are the regular Subversion versioning directories: branches/, tags/, and trunk/.
Inside the appropriate versioning directory are the following subdirectories:
messages/
  • messages/
    • kubrick

This directory contains the Gettext MO and PO files for the locale. Message files are named after the locale name.
In the kubrick folder you should put the translation (using exactly the same PO/MO filename as above) of the i18n-ed default theme, residing at the wordpress-i18n svn repository. There is also another way of translating the default theme:
dist/
This directory contains all files in the WordPress distribution that cannot be Gettexted, which have been translated into the target locale.
If the locale has only a UTF-8 translation of the files, the dist/ directory may be populated with them directly, and the structure within dist should mirror the structure of the WordPress root directory:
  • dist/
    • license.txt
    • readme.html
    • wp-config-sample.php
    • wp-admin/
theme/
It is better to translate the i18n-ed kubrick (see the messages/ part above), instead of using theme/.
Similarly to the dist/ dir, theme/ contains hard-translated theme files. If only a UTF-8 translation is present, the directory can be populated with subdirectories for each theme translated. These subdirectories contain all of the same files as the original theme (except that they're translated), and are named the same as the original theme:
  • theme/
    • default/
      • 404.php
      • index.php
      • sidebar.php
      • images/
Just as with the dist/ directory, if there are multiple character encodings represented, theme/ should contain a subdirectory for each character encoding, which in turn would contain subdirectories for each theme translated.

Using Localizations

In order to localize your installation of wordpress, create a directory named languages inside of wp-includes, if it does not already exist. Then grab the appropriate localization files from the Subversion Repository as described above. The main .mo file and the continent .mo file for the language should go inside the languages directory. Set WPLANG inside of wp-config.php to your chosen language. For example, if you wanted to use french, you would have:
define ('WPLANG', 'fr_FR');

Troubleshooting

Rosetta won't export my translation as an MO file. It just says, "A system error occurred." 
There is a syntax error in your translation that is preventing it from compiling to an MO. Download the PO instead and try compiling it manually with msgfmt. This will tell you which lines the errors are on so you can correct them by hand. If you don't have the GNU Gettext package installed, you can try opening the PO file in Poedit or KBabel to see if they will help you correct the errors, or you can email the wp-polyglots mailing list and ask for someone to debug it for you.Source: http://codex.wordpress.org