How does AI Ethics Impact Translation?

7 minute read

AI is becoming embedded in most of the latest technologies. Therefore, it's increasingly required that everyone involved is aware of the ethical implications of AI practices.

Much of the tech stack in today’s translation activity will be irrigated by AI. This means using machine learning algorithms and data to augment human productivity. It also increasingly requires that everyone involved is aware of the ethical implications of this technology.

So where exactly is the ethical problem in the translation process? AI’s job is to enrich (clients’) content by making it understandable by more people, so it can ideally make a positive impact, either as relevant information or as an economic good. If it fails to do this legally and ethically, then in the long run no one will use it. That’s the basic business deal!

Usually, ethical and/or legal issues around AI in relation to content processing, which was also touched upon in the Moral Machines blog article, focus on:

  • The (mis)use of personal data inside a translation,
  • The unacknowledged or illegal use of another owner’s language data,
  •  Dangerous carbon footprints in compute, and
  • The biases found in data selections that concern personal health, finance, and other sensitive domains relating to hiring, and other interfaces with people.

Note that in none of these cases is the act or fact of translation qua translation a critical component. These aspects of the data business therefore have no direct bearing on moral issues around AI and translation. But we live in a suspicious environment, and translation has now been dissected into distinct tasks, so it is important to recognize potential problems and apply relevant responses.

First, what is so special about translation as an action that could make it less problematic in an AI context?

Here’s one obvious answer: Translation involves the semantic transfer of a piece of content into another human language – i.e. a code switch in a data stream - so the process itself neither adds nor subtracts existing content. The meaning and associated implications and inferences (and of course the social/moral biases!) of the source remain “as is” in the target. The “content” you see in language A is what you get in language B.

If this code switch produces inaccuracies, they will be handled by well-known editing procedures, in the same way as a spelling or attribution error in any written or transcribed content. Translation mistakes are, therefore, not moral or legal failings. Let the public have fun about absurd MT errors, but don’t let us mix messages!

Content can contain moral monstrosities

In fact, the ethical tradition in translation can be summed up by the famous “don’t shoot, we’re only the messengers”. Translators plead non-responsibility once they have accurately and appropriately translated or post-edited a target text, however intolerable or fake its content may be. The fact that content can contain moral monstrosities – e.g. the personal suffering interpreters experienced at the Nuremberg trials or South Africa’s Truth & Reconciliation Commission hearings - is not a translator’s fault or responsibility. 

But truth to what is said or written is by definition part of that mission. Hence the outcry at the horrific murder in 1991 of the Japanese literary translator of an English novel that referred to a damaging myth in the history of Islam. He reproduced truth in his role as an intermediary on behalf of a teller of tales. In principle, a translation will always give access to the truth of what has been said/written; in this way, translators help enrich the human story for everyone. 

This is heavy stuff. As a playful alternative, do check out the wonderfully subversive tale by the Hungarian writer Dezso Kosztolanyi in Kornel Esti, in which a kleptomaniac translator “steals” objects and money from fictional characters in the works he translates by changing the amounts mentioned in the original! A philosophical joke, but we need a little humor these days.

AI in identifying social bias

As translation becomes more deeply embedded in language data-driven processes, we shall increasingly see both source and target texts analyzed by intelligent tools capable of identifying “moral issues” and alerting stakeholders. For example, using word spotting to address those points noted by the European Guidelines for Trustworthy AI. Both humans and bots could then be tasked to seek out well-known biases in the linguistic expression of socially, medically or politically sensitive questions. These could include racial and gender inclusiveness, and suggest corrections to dangerously ambiguous language, or collect data on signals relevant to improving quality/work evaluation. They will inevitably miss some of them, as tools tend to.

Already MT engines can be selected on the basis of their capacity to handle gender-friendly translation issues accurately. In general, then, systemic bias seems unlikely to become a major translation problem. Partly because there are humans in the loop - and partly in spite of that!

The problem of user-generated social media content

One obvious hotspot to monitor is user-generated content from social media and online commentary, whose input quality cannot always be controlled upstream by translation buyers. Once again, a translated text could be automatically scanned for word/phrase signals of dangerous social bias, fake content, etc. However, on many occasions, this “biased” content is precisely what is required as useful data for some translation buyers! They might use these findings to measure the recognition habits and expectations of their readers, so they can adapt their own messaging in future communications.

AI to monitor content consumption

Perhaps the most important development to recognize going forward is that AI will not just be augmenting bits and pieces of the translation chain; it will also be used to monitor the way that translated content is consumed. This does not (yet) raise major ethical issues for the industry unless privacy is breached, but it could generate end-customer concerns about the “rhetorical adequacy” - the social, semantic, and pragmatic fit - of the language expressed.

The problem is that this kind of linguistic editing could penalize translations that fail to produce versions that match the “signal generation quotient” (number of hits or reactions) achieved by the source. So communication surveillance is likely to be competitively tied to highly concrete, measurable results in terms of sales conversions, sentiment responses, and other data points. (See this blog for some background on the role of data and signals in the translation business). “Ethical” checks will doubtless be built into these algorithms.

So when managing teams of data annotators and translators, it will always be wise to inquire into target language sensibilities about content – just in case unexpected issues arise in local societies and/or target reader communities. It is surely the human value of the work done, and of the resulting opening up of access to information or knowledge for other people that makes translation worthwhile in the first place. 

The risk is, of course, that the emerging interest in ethical AI and preferring data sources that represent the wealth of human experience and individual differences will ultimately become just another opportunity for using machine learning to drive content. Our job will always be to value the power of human language, not the imitation machine that pretends to be our digital twin!


Long-time European language technology journalist, consultant, analyst and adviser.

Related Articles
Purchase TAUS's exclusive data collection, featuring close to 7.4 billion words, covering 483 language pairs, now available at discounts exceeding 95% of the original value.
Explore the crucial role of language data in training and fine-tuning LLMs and GenAI, ensuring high-quality, context-aware translations, fostering the symbiosis of human and machine in the localization sector.
Domain Adaptation can be classified into three types - supervised, semi-supervised, and unsupervised - and three methods - model-centric, data-centric, or hybrid.