World-Readiness: Towards a New Translate Benchmark

7 minute read

In the digital translate era, it's vital to learn how to use our resources to ensure we are ready to deliver truly global solutions and experiences! We call this vital new capability “world-readiness”.

We’re entering a digital ‘translate era in which people and machines are learning to cooperate more effectively to deliver rich services and language experiences at scale. The resulting synergies will underpin all new industry strategies in the coming years. As current S-curve wisdom has it, things will evolve slowly at first and then accelerate rapidly. So it’s time to look up from our shiny new instrument panels and scan the road ahead for emerging opportunities.

There are two stand-out challenges for the industry today:

  • one is completing our digital transformation by building out, democratizing and using optimally the technology solutions we have been developing for the last decade;
  • the other is negotiating the tectonic shifts in global economic, political and cultural power patterns that are inevitably impacting language marketplaces.

Put these together and you have one overarching agenda item: learning how to use our resources to ensure we are ready to deliver truly global solutions and experiences! Call this vital new capability “world-readiness” (WR).

The geopolitical challenge to WR also includes two new pressure points for the translate industry. Rising migration triggered by socio-political and climatic change is modifying the linguistic (and hence speaker and communication) mix in many large cities around the globe. And our use of power-greedy GPUs to drive translate will start ringing alarm bells about our energy footprint.

At the same time, and more central to our business, new consumer tongues are emerging on content dashboards aligned to India, Southeast Asia, and Africa. Language revitalization has received a tremendous awareness boost from this year’s U.N.-level foregrounding. And even the EU may get two new official languages if Albania and North Macedonia join the party. This means we shall soon need to factor new language experiences into our growth strategies and build appropriate supply chains involving the right human, technical and managerial resources.

TAUS, therefore, suggests that planning for World-Readiness (i.e. not just reacting to today’s requests) will become a vital strategy for ambitious service suppliers. World-Readiness for the language industry means being able to reach the last customer on the planet - in their language and in a sustainable way wherever they may be.

To achieve WR, we shall need to build out our existing pipelines and imagine more inclusive solutions for at least three new targets: the new highly-multilingual mix; seamless global linkage for all of our human resources, tools and data; and benchmarking progress towards excellence by inventing richer metrics for new language experiences. WR could then pave the way for a truly transformative stage in the evolution of our mindsets, practices, and business offerings. Let’s drill down into these targets.

a) Scaling up Language/Domain Coverage

The human glottosphere is pretty crowded - seven billion people (soon to reach ten) speaking a very uneven mix of some seven thousand tongues: The industry currently tends to handle just a few dozen of these. What is stopping us from planning competitively to reach out and energize hundreds of tongues as part of the world’s ongoing digital transformation? This approach to WR should also encourage translate users to update their own content development ambitions.

Obviously, our central goal is not simply boosting the number of languages in play. It’s about ensuring inclusiveness through language diversification. In other words, serving more people and commercial/institutional needs more adequately by fine-tuning content to a much broader population of vernacular mindsets. This, in turn, can contribute to both preserving existing languages and bringing the world closer together in new ways. We have struggled along with a tacit policy of language hegemony for too long. We all need to look more imaginatively at the power of digital transformation to reboot the right to “speak my language” as well.

An initial five-year action plan for World-Readiness could mean setting targets for delivering ‘translate’ in up to 200 languages or so. This would require a quantum shift towards “multi-pole” language/culture exchanges. It would entail broader and deeper coverage of multi-domain and multi-media content across the globe and would mobilize new jobs and talents. As a result, it would transform supply chains by requiring closer business partnerships between LSPs, tech suppliers, and language specialists, together with a far richer mutualization of resources and technologies.

To achieve any of this, however, we will also need to upgrade today’s informal use of subject matter “domains” in translation practice, and jointly establish rich, dynamic standards for the domain characterization of digital text/speech content. It should be feasible to train algorithms to discover such optimum domains and topics from and for our language data. If so, language-resource sharing for translate pipelines could then be qualified, targeted and automated far more efficiently and finer-tuned than they are today. Who knows what other market discoveries and data handling solutions might emerge down the road under the pressure of this “massively multilingual” agenda?

b) Boosting Interoperability

If we want to go WR, we will need a world-class (and worldwide) network of digital language operations to help LSPs and others deliver on their multilingual promise. With hundreds of languages in play, the hundreds of thousands of translators, editors, designers, terminologists, respeakers, annotators and transcreators, as well as engineers and technologists will need instant access to data to do their jobs. A precondition on WR, therefore, is full interoperability for technology stacks and data circuits.

This can be ensured by industry-wide agreements to use standards at every gateway. These, in turn, should all anticipate a continuous stream of new developments in different types of data resources, tools, and tech platforms that mix together media (speech, text, video, AR/VR) and language tools, such as checkers of all kinds, editors, knowledgebase search, and so on.

End-to-end interoperability will also foster new types of jobs and activities in the field of language management generally. As well as teams of multi-talented translators, we will need design engineers from speech communities to help adapt numerous languages to their digital conversational destinies. Content annotators will be required for every language in the marketplace to ensure machine traction for translate and other tasks in both text and speech media. Planning ahead for full WR interoperability will, therefore, be an important entry ticket, enabling widely scattered teams to all parse the same strings at critical junctures.

By adding the “world” to industry-scale interoperability, therefore, we are automatically anticipating that both R&D and deployment can flourish locally across the planet, stimulating more innovation, keener competition, and competitive pricing throughout the pipeline. Thereby lowering entry costs, all in a virtuous cycle.

c) Measuring Success Unobtrusively

Much time has been spent recently on designing QE systems that respond to the current wave of translate automation. WR, of course, will require highly fluid workflows handling many languages in different domains through complex pipelines. Evaluating their quality and fitness-for-purpose will demand a further turn of the QE screw.

Basically, we will need to move beyond local “translation quality” checks, and start measuring WR for everything passing through the pipeline so as to achieve systemic quality. Machine learning can then be used where necessary to maximize the identification and repair of non-quality, while minimally interfering in human operations. One longer-term problem, however, with this will be the cost of processing masses of data with power-hungry CPUs: the industry will in due course need to look seriously at its compute/translate environmental cost, and factor this into its overall quality profile.

Finally, these QE data should be leveraged to better adapt workflows to WR criteria. For example, they could help answer queries about the economics and logistics of bringing on board 20 new languages; or identifying the quality of the data used in a given pre-translation/translation or by a given client; estimating human/data resource quality/quantities needed to improve a given task; or providing teams of language specialists with a data dashboard giving insight into everything about their individual contributions. By setting measurable WR as a new standard, the industry should be able to better understand the value of both interoperability and quality measurement for its entire business strategy.

TAUS will be raising awareness about this World-Readiness agenda by presenting a first WR Award at its Global Content Conference & Exhibits in San Jose on March 10-11, 2020.


Long-time European language technology journalist, consultant, analyst and adviser.

Related Articles
Explore the fascinating journey of Lisa Vasileva, a Machine Learning Engineer at TAUS, as she transitions from a professional translator to the field of Natural Language Processing (NLP).
The factors that impact the reconfiguration of the translation industry in the 2020s and emerging pricing and licensing models: The Owned, Public, Private, Hosted and Shared.
Looking into the future of the translation industry under seven sections where automated translation is no longer just a freebie on the internet, but entering the real economy of the translation sector, and it changes everything.