Machine translation Getting better, but not perfect

The advantage of translation systems: Large volumes of text can be translated more quickly.
The advantage of translation systems: Large volumes of text can be translated more quickly. | Photo: pachig © 123RF

In this increasingly globalized world, ever-larger volumes of text need to be translated for international documentation. Computers are making advances in this area of work and translation programs are producing some astoundingly good results. Still, they aren't completely effective without human assistance.

It wasn't long ago that the free online translation programs from Yahoo! and Google began spitting out some pretty horrendous texts. Things have changed a bit since then, and though Google's version of the sentence “Rental prices have skyrocketed in Berlin this year” is still rather disappointing (“Mietpreise haben in Berlin schnellte in diesem Jahr”), the online translator from Lingenio, for example, hits the nail on the head: “Mietpreise sind dieses Jahr in Berlin emporgeschnellt.”

Who uses it?

“With individual sentences, translation systems still have trouble due to the lack of subject matter and context,” says computer linguist Kurt Eberle, professor at the University of Heidelberg and director of the company Lingenio, a provider of professional translation software. “In addition,” continues Eberle, private individuals are not really the target group for high-end translation systems. We focus on professional translators and international companies that need regular translations done, in particular from German into English.” About 90 percent of their users employ translation aids for this language pair, estimates Eberle. He sees translation systems as “tools of the trade” that mostly provide support. The advantage is that you can translate large volumes of text more quickly. However, the result is only a rough version that invariably needs to be edited and revised by a human.

Grammar rules or static computations

These software programs compare large amounts of text for the language pair in question and count how many times certain words land next to each other, such as “I” and “go”. Not only individual word groups are saved, but also entire sentences. With every new translated section, the program captures new material that it can then access for future texts. The more it translates, the better the system gets.

The translation systems that produced all of that gibberish in the 1990s worked differently. They relied on the structure of a language whereby programmers fed the computers with grammar rules and sentence structures along with vocabulary. Because the program then translated word-for-word, they often came up short when treating words with more than one meaning. Grave can mean serious or it can be a place where one is buried. The programs could not recognize this kind of contextual nuance.

The statistical method solved that problem. If the word grave is accompanied in a sentence by “death”, then it must be “Grab” in German. If it is associated with a “situation”, it likely means “gravierend”. In order to produce reliable translations, the program needs a lot of text, and if a word does not appear often enough within the texts, it will often be incorrectly translated. “We need a balanced body of text from different areas,” says Eberle. But this is difficult. “There are always meanings that are missing.”

More text, better translations

According to Martin Volk, professor of computational linguistics at the University of Zurich, translation programs will improve in the coming years. “The Internet is gathering more and more digitalized translations from humans,” he says. This is perfect “training” material for statistical translation systems, particularly when it involves web sites that are published in multiple languages.

In the very near future, says Volk, focus will be placed on building systems for certain application areas like law, medicine or the automobile industry. The reason is because increased specialization means increased translation quality. If the systems are based on materials from a limited range of subject matter, the results will be better than with Google Translate, which is fed by texts from many different areas.

Subtitles, yes – literature, no

Martin Volk's special area is subtitles. For Swedish TV, he and his colleagues developed their own system that “became a great success”, he says, where “the sentences are short and concise.” Of course, someone has to review these raw translations, “but the translator is still 20 to 30 percent faster than if he/she were not working with the software.” Even idiomatic expressions that perplex structure-based systems can be mastered with the statistical method. “He is out of his mind” is no longer translated with “Er ist außerhalb seines Geistes”. It is correctly translated with “Er ist durchgeknallt”, but only because a human had to correct that sentence numerous times so that it now appears in the software's memory.

Many of these success stories obviously occur in technical documentation, whereas literary works are still the domain of humans. “They represent an art form. We don't touch that stuff,” says Volk, who is sure that even in 25 years a book like Harry Potter will still not be translated by a machine. Authors simply use too many uncommon words in uncommon combinations and the statistical systems don't have enough samples to work with. “The rules systems do a better job here,” says Eberle. “But only when the vocabulary is well maintained.” To expect a perfect translation from a machine is still very much a fantasy according to the professor. “A translator that is not an expert in a subject needs to first acquaint himself with a law text and refer to dictionaries to do so – just like the software.”