The power of words in machine translation

Language is central to communication, arts, culture, and social structures. However over time, individual words can wield an incredible amount of power.

Language is not just words, but a vehicle of power. Many a times, it becomes a tool keeping oppressive systems at play, and subtly through the media, social hierarchies are revealed and maintained. Boris Johnson’s use of war like language over Brexit has been criticised for fuelling violence in the UK. Whereas in the Netherlands it was claimed their Covid- 19 lockdown was an “intelligent” one, framing them as “superior”. These are just two examples of where language and media framing can lead to both domestic and international divides.
When it comes to identity traits, similar divisive and oppressive framing through language is observed. Why is a woman labelled as feisty, bossy or a ‘mother of’- for the same qualities for which her male counterpart might be called ambitious and managerial, and hardly ever consider ‘father-of’ as a dominating identity? Why are words used to describe white public shooters mostly more lenient than those used to describe Black or Hispanic? The language used by the media, and each individual, has the power to perpetuate dangerous stereotypes.

Artificially Correct
Links between historic injustice and the language used today must be addressed to prevent bias from being further embedded into society. This is true also for translations, where the context is as important as the content. – It takes a keen eye from a translator to notice the social and historical connotations of the words chosen. Even with artificial intelligence (AI) used for machine translations, the same issue emerges.
Machine translation is a useful tool and becoming more so every day, however it often makes mistakes when it comes to gender. Danielle Saunders, research scientist at RWS, an international provider of technology-enabled language services, said that machine translation makes these mistakes, even in unambiguous circumstances, for two reasons;
Algorithmic choices e.g. when the algorithm has an incentive to produce translations with more common vocabulary
Data choices e.g. masculine vocabulary is more common in the datasets available for trainingFor instance, if artificial intelligence is trained on speeches from European Parliament, which it often is, the AI will then reflect the bias within the speeches in its own translations e.g. reaffirming the job role bias of choosing a male doctor over a female doctor. This is then further skewed by the male linguistic convention.
Being a relatively blank canvas, there is an opportunity to correct these initial biases in AI. Perhaps the most obvious port of call is to add more balance into the training materials: but this is expensive since it would require generating new training datasets, and an added layer of complexity would be the risk of creating more biases. And we haven’t even talked about alternate, non-binary pronouns yet that machine translations often misgender. The integration of non-binary, alternate or neo pronouns needs to grow in its practical aspects alongside the evolving cultural discussions on it. Adding to this complexity is the problem that “machine translation learns fast and forgets fast” states Saunders, in a phenomenon termed “catastrophic forgetting” which may happen when the AI has to learn new information. Although embracing the risk of catastrophic forgetting is an option, it is probable that retraining with new materials will risk losing the quality of machine translation. A viable option could be to treat gender as a spell check – providing choice to the user – this choice could incorporate non- binary pronouns, such as the German Xier- pronouns.

“Re-establishing truths”
There is a need to “re-establish truths” and divert away from the oppressive systems in a truthful way, said Emilia Roig, founder and director of the Center for Intersectional Justice. In deconstructing words, as well as contexts that we find uncomfortable, translators can help find the right words. For example, in Portuguese the word “enslaved” is now favoured over the original “slave” – showing the position someone was put in. The words we choose can reveal the afflictions of our society; it can also help establish better constructs, orient our conversations and provide us with the tools to create an equitable discourse. In translation, human and machine, as well as everyday life we must ask ourselves: what are the correct words and what power do they hold?
A final layer of complexity is added when we consider that the language we use today is not going to be the same as the language we use in 20, 50, 100 years’ time. Language is not something we can do prescriptively, but is constantly evolving. This ever-changing nature of language must be reflected in artificial intelligence design, to allow for quick and easy adaptations of language in years to come.

A text by Alana Cullen and Priyanka Dasgupta in the context of the workshop about AI, bias and machine translation (23.-24.4.2021).