The Quality of Machine Translation
Post-Human Literary Translation? A Kafka(esque) Example

Post-Human Literary Translation?
Do machines know what they are doing? | Philippos Vassiliades | CC-BY-SA

Literary translators have long used computers for basic assistance, for example in the form of online dictionaries and corpora, but they have also long been resistant to the idea that machine translation (MT) – or even computer-assisted translation tools such as translation memory – can have any significant role to play in literary translation. With the rapid development of neural machine translation, literary translation scholars (and, to a lesser extent, literary translators themselves) are increasingly acknowledging that this position is untenable, and joining commercial and technical translators in anticipating that what the future holds for literary translators, too, is a role as post-editors of machine-translated output.  Some kinds of literary text in plainer prose style can already be translated moderately well (for the commonest language pairs) by computers, and that trend is only set to intensify.

We underestimate at our peril just how good computers have already become, and how quickly they are progressing, but they still have a long way to go, so now is a good juncture to survey the scene. Just how good are computers at literary translation at the moment? To begin to answer that question I tasked today’s leading MT systems with producing an English version of one of the most famous sentences in the German language, the opening of Kafka’s story “Die Verwandlung” (1915):

​Als Gregor Samsa eines Morgens aus unruhigen Träumen erwachte, fand er sich in seinem Bett zu einem ungeheueren Ungeziefer verwandelt.

Franz Kafka: "The Metamorphosis" (Original: "Die Verwandlung", 1915)

The story that this opening sentence is telling is, of course, deeply odd, but from a linguistic point of view the sentence is quite unexceptional.  It is grammatically well formed, not particularly long or complex, and there are few snares to trap the unwary MT engine: it is unlikely that “Gregor Samsa” will fail to be recognised as a name,“eines Morgens” is a perfectly standard genitive adverbial time phrase, and so on.  For the most part it is quite clear what this sentence means, although one would expect there to be variation and uncertainty over the exact nature of the “ungeheueren Ungeziefer” that Gregor is transformed into, since Kafka keeps it deliberately vague.
If we run this sentence through the available free online MT systems, a surprising degree of consensus emerges. Here is the market leader, Google Translate:

Google Translate's version of Franz Kafka's "The Metamorphosis" (Original: "Die Verwandlung")
Google Translate's version of Franz Kafka's "The Metamorphosis" (Original: "Die Verwandlung") | © Google Translate
This translation is identical to the output from Google Translate’s closest rival, Microsoft (Bing) Translator, and another of the big players, Yandex Translate.  With minor word order changes it is also the same as the outputs from DeepL Translator and Reverso; PONS makes a word order change and substitutes “troubled” for “restless”, but again is otherwise nearly identical. It should be emphasised that these solutions (checked in the course of June to September 2020) are not definitive because the systems are constantly evolving, thanks to programming upgrades and the input of new training examples, not least from end users. What’s more, an apparently insignificant “tweak” to the input – like changing the spelling of Kafka’s “ungeheueren” to the contemporary norm “ungeheuren”, or even just omitting the full stop at the end of the sentence – can make a significant difference to the output.

Some of the lesser used engines introduce a few interesting lexical variations – SYSTRAN Translate turns Gregor into a “monstrous pest”, LingvaNex Translator transforms him into a “tremendous vermin” – and there are still a few egregious outliers that give the whole process a bad name. PROMT makes a hash of the second half of the sentence:
PROMT Translation of Franz Kafka's "The Metamorphosis" (Original: "Die Verwandlung")
PROMT Translation of Franz Kafka's "The Metamorphosis" (Original: "Die Verwandlung") | © PROMT
The wooden spoon goes to the IBM Watson Language Translator demo:
IBM Watson Language Translator's Translation of Franz Kafka's "The Metamorphosis" (Original: "Die Verwandlung")
IBM Watson Language Translator's Translation of Franz Kafka's "The Metamorphosis" (Original: "Die Verwandlung") | © IBM Watson Language Translator
Now we have no reason to expect that these engines will do particularly well on literary examples because they have not been trained to work on this kind of material, but thankfully such disasters are few and far between nowadays. Generally speaking these MT engines converge on a single solution and it is acceptable.

VERMIN VS. INSECTS - The Struggle is Real

Any sensitive reader of Kafka’s German might object that the machines have all failed to render the sentence’s primary literary effect, the triple alliteration of the negative prefixes “unruhigen … ungeheueren … Ungeziefer” – but then no human translator (HT) has ever managed to convey this effect, either.  There’s just too much else going on in that first sentence, and this effect has to be sacrificed. What about “a vermin”?  You might think, as I did on first seeing this output, that it is a tell-tale grammatical mistake which reveals the computers’ imperfect command of the target language because “vermin” is not a count noun. But “a vermin” turns out to be no shibboleth, for it is precisely the solution offered by the American translator Stanley Corngold in his bestselling 1972 version:

When Gregor Samsa woke up one morning from unsettling dreams, he found himself changed in his bed into a monstrous vermin.

Corngold is followed in this rendering by another US-based human translator, Joachim Neugroschel, and if it indicates anything then it would appear to be a US/UK English divide, since UK-based Kafka translators who are tempted by the “vermin” translation for “Ungeziefer” have tended to feel awkward about treating it as a count noun and prefer to qualify it, as in Joyce Crick’s “some kind of monstrous vermin” or John R. Williams’s “a huge verminous insect”.
It would hardly be a surprise to find that MT engines have a bias in favour of US English, but a brief review of the many human-produced English translations of this story – published on both sides of the Atlantic – shows that there is a much greater variety of solutions among human translators than among computer systems.  The first translators, Scottish couple Edwin and Willa Muir, rendered “ungeheueres Ungeziefer” as “a gigantic insect”, and more recent versions include “a monstrous insect” (Malcolm Pasley), “a monstrous cockroach” (Michael Hofmann), “an enormous bedbug” (Christopher Moncrieff) and “some sort of monstrous insect” (Susan Bernofsky).  These are all acceptable translations, of course, and their rich variety reflects the great variety of critical speculation that exists over what Kafka actually means to convey here (Baddiel 2015; Gooderham 2015).

Factoring in Variety, Accuracy & Complexity

The nature of the creature that Gregor Samsa turns into is a notorious interpretative crux in this story, but comparing human translations reveals a much greater variety of versions for each key term in Kafka’s opening sentence than is produced by MT. Are Gregor’s “unruhigen” dreams “uneasy” (Muirs, Crick, Williams), “unsettling” (Corngold), “troubled” (Pasley, David Wyllie, Hofmann, Bernofsky), “agitated” (Neugroschel), “anxious” (Ian Johnston) or “fitful” (Moncrieff)? Each choice is grammatically sustainable, each conveys “the meaning” (a meaning) of “unruhigen”, but each also conveys a slightly different nuance. The most valuable thing about multiple retranslations of a classic work is precisely this kind of variety: it is the reason why we keep buying and reading new translations, and why publishers keep commissioning them, but – for the moment, at least – such variety appears to be being short-circuited by MT systems aiming for a certain kind of acceptable accuracy which relies on a shared, ultimately still statistical model and leads to the kind of convergence on a single “safe” solution of the kind we have observed in my small example.
We can hardly blame MT systems for prioritising the accurate transfer of source content, for that is what they are designed to do, but information alone is not what makes literature what it is. Indeed, Walter Benjamin notoriously argues that the task of the (literary) translator is not primarily to convey information at all (Benjamin 2012, 75). The good news is that MT has put its “wild west” days behind it, and accuracy levels have been rapidly increasing in recent years, leading MT developers to claim that in certain restricted domains and for certain language pairs, MT systems have achieved parity with human translators (Linn 2018) or even exceeded them (Popel et al. 2020).  But we need to keep such claims in perspective, and recognise that “state-of-the-art systems lag significantly behind human performance in all but the most specific translation tasks” (Caswell & Liang 2020).

And when it comes to literary translation, there is still a very long way to go (Toral & Way 2018; Matusev 2019; Fonteyne, Tezcan & Macken 2020; Mohar, Orthaber & Onič 2020) – indeed, “human parity” remains a distant dream. MT did well on my sample sentence, but recent studies have shown that with more complex literary structures even the latest MT software is soon floundering (Läubli, Sennrich and Volk 2018). Key aspects of literary style, such as narrative voice, pose severe challenges for MT (Taivalkoski-Shilov 2019a, Taivalkoski-Shilov 2019b, Kenny & Winters 2020). As Douglas Hofstadter points out (Hofstadter 2018), computer translators do not actually understand anything about the text they manipulate: they lack real-world knowledge and the ability to harness and interpret it (i.e. to perform localisation rather than just translation).  Moreover, we are more demanding of literary translations, as well: we want to read something that is more than just “acceptable” or “good enough”.

The Future of Translation

The consensus among commentators, then, is that “the human is likely to remain the most critical component in the translation pipeline for many years to come” (Lumeras & Way 2017, 21), and that “when it comes to literature […,] computers will have a hard time keeping up with humans for at least a little while longer” (Polizzotti 2018, 46). In the meantime, it can be expected that human literary translators will make ever greater use of computer assistance, as has already been the case for some time with their commercial and technical cousins (Youdale 2019). “The future of translation is part human, part machine” (Screen, 2017), with the balance shifting, even in literary translation, towards humans post-editing machine translated output. For a long time now in the world of human translation, publishers have commissioned professional literary translators to produce “literal” versions of literary works so that better-known writers who lack the foreign-language understanding can work them up into publishable new versions. A recent example of this is Geoffrey Hill’s Penguin translation of Ibsen’s Peer Gynt and Brand (Ibsen 2017), which was based on literal annotated versions of the plays by, respectively, Janet Garton and Inga-Stina Ewbank. At the very least, before too long MT can aspire to take over and automate this kind of “ghost translator” role.

The jury is currently out on whether computers can ultimately be more creative than this. On the one hand Hofstadter is warning against what he calls the “ELIZA effect”, when computers appear to be more human-like than they actually are (Hofstadter 2018); on the other, Mark O’Thomas is much more sanguine about a trans-human future for literary translation:

The role of the literary translator might eventually come within the domain of a software application that has adopted the translation memories of a particular individual. Even within our present knowledge and experience of technology, it is not difficult to imagine software that can assimilate and adopt the preferred lexical set of an individual and their adopted usage in more than one language. By mapping these across into a translation memory, the software might then go on to produce translations akin to those a translator might have written as well as affording the possibility of creating translations by a particular translator post-mortem.

Mark O’Thomas (2017)

Beyond such crystal ball-gazing, it seems to me that a more interesting question is: what difference would it make to human literary translators if it did become possible to publish MT-generated literary translations with only the kind of light post-editing that most human-produced translations currently require (which is the only relevant definition of “human parity” in this context).  Irrespective of how soon this might become a reality (indeed, whether it might become a reality at all), it seems to me that it pays to consider what might be the consequences of “the rise of the machines”, and credit computers with the ability to pass a literary translation “Turing Test” at some point.

In the post-human age of literary translation, what might become of the human literary translator?

From the point of view of those who commission and pay for literary translations (publishers), the fully automatic high-quality machine translation of literary texts might seem like a kind of utopia.  But it is at least questionable whether by the time that stage is reached, the cutting-edge translation software required to cut out the “middle (wo)man” translators will be available at no cost.  And it is difficult to imagine the role of the human translator gatekeeper ever becoming completely redundant when the track record of MT algorithms in avoiding unwelcome biases is so poor (Marasligil 2016; Taivalkoski-Shilov 2019a). From the point of view of consumers of literary translations (readers), it has to be a boon when vast quantities of literature become available in (automated) translation for the first time, but again the HT gatekeeper/post-editor role is unlikely to disappear entirely, for such translations could do more harm than good if they are not of a high enough quality to be pleasurable to read. From the point of view of translators themselves there is much talk of computer-induced redundancy, but is the literary translator really going to go the way of the medieval scribe?  The role of human literary translator in its current form seems set to be largely phased out, but that doesn’t automatically mean mass unemployment for translators.  After all, in a parallel case the rise of information technology saw librarians extend their skill-set and morph into more generic information professionals, and literary translators could well do the same.
There are plenty of time-consuming human activities that we are now spared by computers, but I do not think that we will want literary translation to be one of them.  We may enjoy post-human literary copying (the drudgery of the medieval scriptorium is defunct in the age of the printing press, the photocopier and the scanned PDF file) but I would argue that we don’t actually want post-human literary translation. For one thing, new technologies never entirely supplant the old ways. The invention of television didn’t mean the death of radio; the invention of the CD didn’t toll the knell for vinyl.  We might think of the future role of the literary translator as similar to that of the chess grandmaster in the age of the chess supercomputer.  In many ways chess has been effectively “solved” by computers: it’s nearly a quarter of a century since IBM’s Deep Blue first defeated reigning world champion Garry Kasparov in 1996, and since then the gap between computers and humans has widened remorselessly, yet world championships for human chess players continue to be held (you can still make a living out of chess), while millions of amateur chess players across the world continue to enjoy the game – and improve by playing against computers.
Even if/when literary translation is “solved” by computers, then, I fully expect that humans will want to continue the practice – just as, presumably, there will be those who continue to enjoy driving in the age of the self-driving car, and so on.  Human translators will continue to enjoy the process as well as the product, and there will be readers who enjoy the human-translated product, too – which will command premium prices, like line-caught fish in the age of the factory trawler, or a hand-built Aston Martin in the age of the production-line robot. Literary translation will continue to provide an outlet for the creative human spirit (Large 2018, 92–94) – and if that ever becomes an aspect of human translation that machines also strive to emulate, then all the better for everyone.


Baddiel, David (2015), “The Entomology of Gregor Samsa”.
Benjamin, Walter (2012), “The Translator’s Task”, trans. Steven Rendall, in Lawrence Venuti (ed.), The Translation Studies Reader, 3rd edn (London and New York: Routledge), 75–83.
Caswell, Isaac & Bowen Liang (2020), “Recent Advances in Google Translate”.
Fonteyne, Margot, Arda Tezcan & Lieve Macken (2020), “Literary Machine Translation under the Magnifying Glass: Assessing the Quality of an NMT-Translated Detective Novel on Document Level”, in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 3783–3791.
Gooderham, W.B. (2015), “Kafka’s Metamorphosis and its Mutations in Translation”, The Guardian, 13 May.
Hofstadter, Douglas (2018), “The Shallowness of Google Translate”, The Atlantic, 30 January.
Ibsen, Henrik (2017), Peer Gynt and Brand, trans. Geoffrey Hill (London: Penguin).
Kenny, Dorothy, & Marion Winters, “Machine Translation, Ethics and the Literary Translator’s Voice”, Translation Spaces, 9/1 (August 2020), 123-149.
Läubli, Samuel, Rico Sennrich & Martin Volk (2018), “Has Machine Translation Achieved Human Parity? A Case for Document-Level Evaluation”, in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 4791-4796, Brussels, Belgium, October-November. Association for Computational Linguistics.
Large, Duncan (2018), “Could Google Translate Shakespeare?”, In Other Words, 52 (Winter 2018/19), 79-94.
Lawson, Richard H. (1960),“Ungeheueres Ungeziefer in Kafka’s ‘Die Verwandlung’”, German Quarterly, 33/3 (May), 216-219.
Linn, Allison (2018), “Microsoft Reaches a Historic Milestone, Using AI to Match Human Performance in Translating News from Chinese to English”, March 14.
Lumeras, Maite Aragonés, & Andy Way (2017), “On the Complementarity between Human Translators and Machine Translation”, Hermes, 56, 21-42.
Marasligil, Canan (2016), “Literary Translation Beyond Automation”, International Literature Showcase.
Matusov, Evgeny (2019), “The Challenges of Using Neural Machine Translation for Literature”, in Proceedings of the Qualities of Literary Machine Translation, 10–19, Dublin, Ireland, 19 August. European Association for Machine Translation.
Mohar, Tjaša, Sara Orthaber & Tomaž Onič (2020), “Machine Translated Atwood: Utopia or Dystopia?”, ELOPE: English Language Overseas Perspectives and Enquiries, 17/1, 125–141.
O’Thomas, Mark (2017), “Humanum ex machina: Translation in the Post-Global, Posthuman World”, Target, 29/2 (January), 284–300.
Polizzotti, Mark (2018), Sympathy for the Traitor: A Translation Manifesto (Cambridge, MA and London: MIT Press).
Popel, Martin et al. (2020), “Transforming Machine Translation: A Deep Learning System Reaches News Translation Quality Comparable to Human Professionals”, Nature Communications, 11/4381.
Screen, Ben (2017), “The Future of Translation is Part Human, Part Machine”, The Conversation, 11 July.
Taivalkoski-Shilov, Kristiina (2019a),“Ethical Issues Regarding Machine(-Assisted) Translation of Literary Texts”, Perspectives, 27/5, 689–703.
Taivalkoski-Shilov, Kristiina (2019b), “Free Indirect Discourse: An Insurmountable Challenge for Literary MT Systems?”, in Proceedings of The Qualities of Literary Machine Translation, 35–39, Dublin, Ireland, 19 August. European Association for Machine Translation.
Toral, Antonio, & Andy Way (2018), “What Level of Quality Can Neural Machine Translation Attain on Literary Text?”, in Joss Moorkens et al. (eds), Translation Quality Assessment: From Principles to Practice (Berlin and Heidelberg: Springer), 263-87.
Youdale, Roy (2019),“Computer-Aided Literary Translation: An Opportunity, Not a Threat”, In Other Words, 53 (Summer), 45–51.