2021 According to TAUS
12 minute read
With data taking the center stage, our message for 2021 is: be wary of inequality in our ecosystem as a result of this fantastic AI revolution. Pay your translators and the data keepers who keep your data in optima forma fairly.

NO MORE VISION. NO MORE PREDICTIONS. The future is now. No need anymore to predict machine translation. It’s here and it’s working. Our message for the new year this time is therefore very pragmatic and single-minded: fair pay for the translators and data-keepers!

Be careful what you wish for

TAUS started life in 2005 as a think tank, boldly predicting a future of Machine Translation. In 2008 we launched an industry-shared repository of language data - the TAUS Data Cloud - helping the early adopters of Statistical MT systems squeeze better performance out of their engines. Then came Neural MT which boosted the quality of automatic translation further. COVID-19 and the resulting global crisis finished the job, leaving no doubt that technology will change our work and lives for good. And now we are in the midst of these turbulent times, we realize that to become fully reliant on the machines, we need ever more data and always better data. Data that is most of the time linked to the human instrument of knowledge and understanding: language. Hence the emergence of a new sector: Language Data for AI. 

Download Language Data for AI Report >>>

They say: be careful what you wish for. It’s true, we wished for MT technology to work so well. We envisioned how this revolutionary technology could open knowledge to all citizens in our global society and how it could contribute to our evolution as a human community. And now, we have to be wary of the consequences.

A world of inequalities

Revolutionary technology breakthroughs often lead to a rethinking of our ethical principles and a shake-up of our economic models. While we are getting closer to unlocking knowledge and sharing information with practically everyone around the globe, we realize that the fundamental ideal of equal access causes other inequalities to grow. One is misinformation and mistranslation caused by bias in the data that’s used to train the models. How reliable is the information and knowledge we are being offered? How colored is the content by whoever controls the data and the algorithms? How complex it is to crack this ethical inequality problem in AI-driven translation in the long run is the subject of another TAUS article on Multilingual Morals, which will be published early in the New Year.

Here we zoom in on the more immediate economic consequences of the AI revolution for the translation profession as a business sector. If everything is pretty much automated, we have to ask ourselves the basic economic questions of where cost and value are being added. As we argued in our World-Readiness and Translation Economics article last year we are heading towards a free machines model, where besides the cost of maintaining the IT infrastructure there is almost no variable cost involved for the enterprise buyers of translation. Well, that only exists in an ideal world of course where the machines do perfect jobs. The reality is that human intelligence - linguistic and cultural interpretation - remains indispensable for the machines to constantly learn to do a better job.

Fair pay for the translators and data keepers

2020-12-22While job opportunities for professional translators may be shrinking as a result of the success of automated translation, the need for a new kind of worker is growing explosively. In the LD4AI report we refer to this trend as the rise of the global cultural professional. It’s hard to frame the profile of this new worker. Unlike the professional translator this worker does not need to be linguistically trained or experienced. The basic requirement is that s/he is deeply rooted in his/her local culture. The way they engage with their work givers is through crowdsourcing platforms. They log in on the platforms to claim simple tasks of transcribing a text, interpreting an image, recording a script and dozens of other human intelligence tasks. Millions of people around the world join the new workforce of crowdworkers, that we officially like to refer to as the data keepers. Professional translators running low on their regular jobs may also be taking jobs as data keepers from time to time.

Rates for translation have always been under pressure, but in the crowdsourcing world market dynamics drive the rates further down. One dollar for a task that can easily take fifteen minutes of somebody’s time. Is that fair? The workers have become anonymous and the competition is severe. Our economic model may just not be fit for this new age of AI. (See also this article in MIT Technology Review: AI needs to face up to its invisible-worker problem.)

So here is where our message for 2021 comes in: be wary of inequality in our ecosystem as a result of this fantastic AI revolution. Pay your translators and the data keepers who keep your data in optima forma fairly.

TAUS Fair Cooperation Principle

Full disclosure: TAUS is also in the business of creating and annotating data for AI. We are advocates of the Data-First Paradigm, which means that we believe that it makes total sense to develop and optimize your data for translation first (before doing actual translation production). We use NLP technology to clean data and to cluster and tune corpora for domain adaptation and customization of MT systems. And through our own Human Language Project platform we work with data keepers around the world to create new datasets for low-resource languages and domains, and we engage them to enrich and annotate the data. Customers hiring us for data services can trust that TAUS always pays the Human Language Project workers above the minimum wage thresholds in the respective countries and will never go to the bottom of the ‘market’. We call this the Fair Cooperation Principle.

Opening the black box with the Data Marketplace

A further step that TAUS takes towards a reform of the translation ecosystem in 2021 is launching the Data Marketplace. The Data Marketplace allows everyone who has invested in good quality translations for years to sell their data directly to the technology companies and enterprises that develop and train MT systems. This means that the players in our ecosystem that have put in the hard work now have an open channel to reap the monetary benefits from it. The translation industry has, over the years, been referred to by many as a black box because of its inherent lack of transparency. You dump in a project, a document to translate, but you have no idea what it takes and who is working on it. On the Data Marketplace we are opening the black box. We put the data keepers in the spotlight. After all, it is them who do the hard work and create the value.

See the success stories here from Adéṣinà Ayẹni in Nigeria about his efforts to put his language Yorùbá onto the world stage of languages, and from TransLink in Russia who see their early participation in the Data Marketplace as a great head start advantage over other LSPs.  

TAUS Content Branches Out 

And so, with the beginning of the twenties, TAUS as a think tank branches out into topics covering the language data for AI sector. The TAUS writers team is expanding to bring you lots more good content and food for thought to broaden your perspectives on the language data for AI industry, data applications and best practices.

TAUS wishes you all an equally healthy and prosperous 2021!


Jaap van der Meer founded TAUS in 2004. He is a language industry pioneer and visionary, who started his first translation company, INK, in The Netherlands in 1980. Jaap is a regular speaker at conferences and author of many articles about technologies, translation and globalization trends.

Related Articles
Explore the fascinating journey of Lisa Vasileva, a Machine Learning Engineer at TAUS, as she transitions from a professional translator to the field of Natural Language Processing (NLP).
The factors that impact the reconfiguration of the translation industry in the 2020s and emerging pricing and licensing models: The Owned, Public, Private, Hosted and Shared.
Looking into the future of the translation industry under seven sections where automated translation is no longer just a freebie on the internet, but entering the real economy of the translation sector, and it changes everything.