Using software originally developed in the 1980s by
researchers at IBM, Google has created an automatic translation tool
that is unlike all others. It is not based on the intellectual
presuppositions of early machine translation efforts – it isn't an
algorithm designed only to extract the meaning of an expression from its
syntax and vocabulary.
In fact, at bottom, it doesn't deal with meaning at all. Instead of taking a
linguistic expression as something that requires decoding, Google Translate
(GT) takes it as something that has probably been said before.
It uses vast computing power to scour the internet in the blink of an eye,
looking for the expression in some text that exists alongside its paired
translation.
The corpus it can scan includes all the paper put out since 1957 by the EU in
two dozen languages, everything the UN and its agencies have ever done in
writing in six official languages, and huge amounts of other material, from
the records of international tribunals to company reports and all the
articles and books in bilingual form that have been put up on the web by
individuals, libraries, booksellers, authors and academic departments.
Drawing on the already established patterns of matches between these millions
of paired documents, Google Translate uses statistical methods to pick out
the most probable acceptable version of what's been submitted to it.
Much of the time, it works. It's quite stunning. And it is largely responsible
for the new mood of optimism about the prospects for "fully automated
high-quality machine translation".
Google Translate could not work without a very large pre-existing corpus of
translations. It is built upon the millions of hours of labour of human
translators who produced the texts that GT scours.
Google's own promotional video doesn't dwell on this at all. At present it
offers two-way translation between 58 languages, that is 3,306 separate
translation services, more than have ever existed in all human history to
date.
Most of these translation relations – Icelandic to Farsi, Yiddish to
Vietnamese, and dozens more – are the newborn offspring of Google Translate:
there is no history of translation between them, and therefore no paired
texts, on the web or anywhere else. Google's presentation of its service
points out that given the huge variations between languages in the amount of
material its program can scan to find solutions, translation quality varies
according to the language pair involved.
What it does not highlight is that GT is as much the prisoner of global flows
in translation as we all are. Its admirably smart probabilistic
computational system can only offer 3,306 translation directions by using
the same device as has always assisted intercultural communication: pivots,
or intermediary languages.
It's not because Google is based in California that English is the main pivot.
If you use statistical methods to compute the most likely match between
languages that have never been matched directly before, you must use the
pivot that can provide matches with both target and source.
A good number of English-language detective novels, for example, have probably
been translated into both Icelandic and Farsi. They thus provide ample
material for finding matches between sentences in the two foreign languages;
whereas Persian classics translated into Icelandic are surely far fewer,
even including those works that have themselves made the journey by way of a
pivot such as French or German. This means that John Grisham makes a bigger
contribution to the quality of GT's Icelandic-Farsi translation device than
Rumi or Halldór Laxness ever will. And the real wizardry of Harry Potter may
well lie in his hidden power to support translation from Hebrew into
Chinese. GT-generated translations themselves go up on the web and become
part of the corpus that GT scans, producing a feedback loop that reinforces
the probability that the original GT translation was acceptable. But it also
feeds on human translators, since it always asks users to suggest a better
translation than the one it provides – a loop pulling in the opposite
direction, towards greater refinement. It's an extraordinarily clever
device. I've used it myself to check I had understood a Swedish sentence
more or less correctly, for example, and it is used automatically as a
webpage translator whenever you use a search engine.
Of course, it may also produce nonsense. However, the kind of nonsense a
translation machine produces is usually less dangerous than human-sourced
bloopers. You can usually see instantly when GT has failed to get it right,
because the output makes no sense, and so you disregard it. (This is why you
should never use GT to translate into a language you do not know very well.
Use it only to translate into a language in which you are sure you can
recognise nonsense.)
Human translators, on the other hand, produce characteristically fluent and
meaningful output, and you really can't tell if they are wrong unless you
also understand the source – in which case you don't need the translation at
all.
If you remain attached to the idea that a language really does consist of
words and rules and that meaning has a computable relationship to them (a
fantasy that many philosophers still cling to), then GT is not a translation
device. It's just a trick performed by an electronic bulldozer allowed to
steal other people's work. But if you have a more open mind, GT suggests
something else.
Conference interpreters can often guess ahead of what a speaker is saying
because speakers at international conferences repeatedly use the same
formulaic expressions. Similarly, an experienced translator working in a
familiar domain knows without thinking that certain chunks of text have
standard translations that he or she can slot in.
Translators don't reinvent hot water every day. They behave more like GT –
scanning their own memories in double-quick time for the most probable
solution to the issue at hand. GT's basic mode of operation is much more
like professional translation than is the slow descent into the "great
basement" of pure meaning that early mechanical translation developers
imagined.
GT is also a splendidly cheeky response to one of the great myths of modern
language studies. It was claimed, and for decades it was barely disputed,
that what was so special about a natural language was that its underlying
structure allowed an infinite number of different sentences to be generated
by a finite set of words and rules.
A few wits pointed out that this was no different from a British motor car
plant, capable of producing an infinite number of vehicles each one of which
had something different wrong with it – but the objection didn't make much
impact outside Oxford.
GT deals with translation on the basis not that every sentence is different,
but that anything submitted to it has probably been said before. Whatever a
language may be in principle, in practice it is used most commonly to say
the same things over and over again. There is a good reason for that. In the
great basement that is the foundation of all human activities, including
language behaviour, we find not anything as abstract as "pure meaning",
but common human needs and desires.
All languages serve those same needs, and serve them equally well. If we do
say the same things over and over again, it is because we encounter the same
needs, feel the same fears, desires and sensations at every turn. The skills
of translators and the basic design of GT are, in their different ways,
parallel reflections of our common humanity.
This is an extract from 'Is That A Fish In Your Ear: Translation and the
Meaning of Everything' by David Bellos.