How should the translation industry engage with the current conversation about ethical concerns in technology use? Here are some preliminary notes for an answer.
Apart from genuine eco-concerns about the carbon footprint generated by machine learning applications, there seems to be little need to focus on AI explainability as part of a major CSR shift in corporate agendas. And any pre-AI ethical problems about translation practice itself have been largely addressed over the years. Yes, we believe in encouraging “moral machines” just as we believe in supporting moral actors in business generally. But is it necessary to introduce specific best practices to formalize this commitment?
When addressing climate change, it is clear that machine learning for language-based tasks is particularly power-greedy. There has been widespread commentary on this issue, and recent Transformer models used to generate text from data have been singled out for their high energy cost. A further effect of building such models has been to restrict useful research on massive language data applications to only the very largest tech firms, which can afford such energy bills. This tends to exclude much-needed yet poorly-funded academic research.
At the same time, though, projects have been announced that aim to reduce the “learn” time, and hence the cost of operating these big language models. In the translation domain, could the emergence of smaller datasets also impact energy spend?
The compute industry is now aware of the carbon cost of GPU/TPU usage. The longer-term question for us, though, will be how to address the global cost of rapid growth in machine use worldwide if we scale up to another 50 to 500 language pairs over the coming decades. Or start using less climate-friendly “massively multilingual” billion-word datasets to drive one-shot translation jobs!
As Systran’s Jean Senellart said in a recent TAUS webinar, training one NMT model is equivalent to burning down a large tree. We should evolve towards more qualitative (and not quantitative) breakthroughs with the technology, and work together to share models instead of running the same processes and pairs again and again to achieve a minuscule advance in BLEU scores. We should also encourage LSPs to systematically choose to work with eco-friendly tech suppliers, and measure and raise awareness internally about carbon counts where necessary.
As to fears about the ethical problem of technologically-driven bias or even “fake” experiences in translation, the response can be more sharply etched. The practice of AI-driven translation cannot by itself lead to “social” bias or intentional fakes, only to either accurate or inaccurate outputs. Any bias will, therefore, be carried over from the source text - it is exclusively the use made of a translation that can render its truth value “fake” or “genuine.” The whole industry is organized precisely to prevent clients from shooting the messengers, however, biased the messages they agree to process!
That said, service suppliers might still be concerned about potential bias within the datasets used to build automated translation solutions. The response could be that (post)editing is basically tasked with removing any traces of unwanted “bias” generated by an unthinking machine. It would, of course, be interesting to know whether we can teach the technology to automatically isolate potential bias (in the “social” sense) from semantic error in the industry sense of mistranslation. Or more subtly, could translating something accurately unwittingly induce a sentiment of bias for a given native speaker? Going forward, the pursuit of translation accuracy may require social inclusiveness in certain cases to address the emerging norms of new language user communities.
Similarly, as mentioned above, is there a risk of automatically-generated source text (e.g. via GPT-2 type solutions) entering the translation circuit unchecked, thereby increasing the likelihood of built-in bias? Before we answer these questions, we will need a clearer technical grasp of the details, plus better examples of high-risk cases.
Finally, it might be considered an ethical practice for translation suppliers to prioritize the adoption of open-source technology solutions. This would mean preferring the services of tech suppliers that espouse OS to ensure that technology choice is not restricted to the products of just a few big players, and that further research and innovation is supported. And, as Jean Senellart also suggested, this could potentially encourage the monetization of data and models across an industry marketplace.
There are other non-tech moral issues that could be raised in the industry in its conversations with both clients and end-users. It is surely preferable, for example, that a supplier should inform a client of potentially anti-inclusive or non-empathetic behavior due to not including a given language in a content package targeting a country or region - e.g. failing to include an order for a small-population local-language version of important medical or legal information.
Ultimately, this sort of activism could evolve into a much broader “ethics & education” agenda, whereby suppliers try to ensure that translation-focused AI solutions are systematically adapted to under-served populations in general. However, this would mean a proactive step into unknown territory for the industry.
All these topics clearly need richer, deeper and more informed debate. Tell us what you think!
Long-time European language technology journalist, consultant, analyst and adviser.