Access the world of language data with TAUS
Sharing insights, ideas and knowledge
icons-action-calendar7 Oct 2022

In recent years, NMT systems are getting better and better, some even claiming human parity. If systems on-par with human translators could really be deployed, that would fulfill the “no-human in the loop” dream that the industry seems to indulge in more and more frequently.

icons-action-calendar25 Oct 2022

After three years of travel restrictions, TAUS Massively Multilingual Conference made its way back to San Jose, California on 11-13 October 2022. TAUS Founder and CEO Jaap van der Meer gave the opening keynote addressing the changes the world has gone through in the last few years,- from bigger geolinguistic shifts, fragmentation of globalization, and rising populist movements in the world, to more migration than ever. Referencing the debate between Nicholas Ostler and Lane Greene at the 2013 TAUS event about whether English or MT was the new lingua franca, he then asked, based on all these changes, where is English today as a language. Without a doubt, the world is massively multilingual today. New ways of supporting this new world should be the focus of technology. Jaap asked “how do we support a world with total diversity and inclusion?” and answered, “we need a lot of data and humans in the loop.”

icons-action-calendar16 Aug 2022

It’s been almost three years since we’ve regularly come together all around the world to discuss the state of the industry. In our 15 years of events history at TAUS, we’ve discussed many topics. From localization workflows, pricing models, adopting quality evaluation metrics in our translation workflows, quality standards, and the many different translation technologies that help translators become more efficient, productive and creative. We brainstormed on solutions for the various challenges we came across, and in turn, TAUS started to create our own suite of products and services to address some of these challenges. The world has changed over the past years, and TAUS has changed as well. We’re so pleased to get together again and present the new TAUS, along with the exciting new subsector Language Data for AI, to you at our upcoming Massively Multilingual Conference & Expo in San Jose, CA on 11, 12 and 13 October 2022.

icons-action-calendar30 Jun 2022

MT has come a long way. After seventy years of research, the technology is now taken into production. And yet, we are missing out on the full opportunities. Because the developers are preoccupied with the idea that the massive models will magically solve the remaining problems. And because the operators in the translation industry are slow in developing new MT-centric translation strategies. This article is an appeal to everyone involved in the translation ecosystem to come off the fence and realize the full benefits of MT. We can do better!

Speech recognition is a complex mélange of linguistics, mathematics and statistics. Also known as speech-to-text, it attempts to identify spoken words to then process human speech into written format. To do so in the most natural and precise way, AI and ML are used to integrate grammar, syntax, structure, and composition of audio and voice signals to best understand & process human speech.

icons-action-calendar8 Jun 2022

Amsterdam,  June 8, 2022  TAUS, a leading provider of human-powered language data for AI solutions, has published the TAUS DeMT™ Evaluation Report 2022. The report compares TAUS DeMT™ performance against available major machine translation engines in 8 language pairs for the eCommerce domain, 18 language pairs for the Medical/Pharma domain, and 4 language pairs for the Financial domain. This report’s findings are based on Polyglot Technology LLC’s independent analysis to benchmark the machine translation quality of DeMT™ relative to other major online machine translation providers.

icons-action-calendar8 Jun 2022

AMSTERDAM, June 8, 2022 TAUS, a leading provider of human-powered language data for AI solutions, announced today that it has been listed as a Representative Vendor in the Gartner Market Guide for AI-Enabled Translation Services 2022 report. 

Audio transcription is a service that has been seeing growing demand in recent years to help businesses communicate with their stakeholders. The rise in demand for transcription comes from the shift from written content to other types of multimedia such as video or audio - which are increasingly common in day-to-day business activities. However, written reports are still necessary for an easy conveyance of information.

icons-action-calendar8 Apr 2022

It doesn’t happen very often nowadays, but every now and then I still find in my inbox a great example of what is becoming a relic from the past: a spam email with cringy translation. Like everyone else, I’m certainly not too fond of spam, but the ones with horrendous translations do get my attention. The word-by-word translation is like a puzzle to me: I want to know if I can ‘reverse-translate’ it to its original phrasing.

icons-action-calendar16 Feb 2022

Amsterdam, February 16, 2022 - TAUS, the one-stop language data shop, is pleased to announce that TAUS data products and data services are now available to users of Amazon Translate on the AWS Marketplace. This first step marks the beginning of continuous collaboration between TAUS and Amazon Translate. As of today, AWS customers can review and buy 30 bilingual corpora in the Ecommerce, Medical/pharmaceutical and Finance domains. The languages available are as follows:

icons-action-calendar19 Jan 2022

Online machine translation engines provide easy access to high-quality machine translations. They are optimized for content like news articles and social media posts that users of online platforms frequently translate.

icons-action-calendar19 Jan 2022

Amsterdam, January 19, 2022 - TAUS, the one-stop language data shop established through deep knowledge of the language industry, globally sourced community talent, and in-house NLP expertise, launches a new service: Data-Enhanced Machine Translation (DEMT) on their Data Marketplace

icons-action-calendar3 Jan 2022

Bilingual, NLP-driven word clouds are now available in TAUS Data Marketplace. In this article, we discuss what word clouds are and what they can tell us about the contents of a document containing bilingual text data.

icons-action-calendar2 Dec 2021

This is the third article in my series on Translation Economics of the 2020s. In the first article published in Multilingual, I sketched the evolution of the translation industry driven by technological breakthroughs from an economic perspective. In the second article, Reconfiguring the Translation Ecosystem, I laid out the emerging new business models and ended with the observation that new smarter models still need to be invented. This is where I will now pick up the thread and introduce you to the next logical translation solution. I call it: Data-Enhanced Machine Translation.

icons-action-calendar1 Dec 2021

Technologies such as Natural Language Processing (NLP), deep learning and computer vision have been thriving since data science has become well-established as a field of study and expertise. These developments have paved the way for the rise of machine learning (ML) to achieve the concept of artificial intelligence (AI). The transformative effects of these new technologies continue to be observed in our daily lives at a gradually increasing pace as we move into 2022. 

icons-action-calendar18 Nov 2021

Google Research team has recently published a paper titled Data Cascades in High-Stakes AI. The six authors of this article, Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora Aroyo, bring light to a profound pattern of data undervaluation in high-stake fields where AI models are critical and prevalent. They conclude that although there is great interest in creating MT and ML models, there is less interest in doing the actual data work. 

icons-action-calendar11 Nov 2021

Amsterdam, 11 November 2021- TAUS launches a new course on their eLearning Platform on language data management. The course gives a comprehensive overview of the language data for the AI sector. It comprises five modules covering extensively various language data for AI applications and language data services, the language data market and pricing, data bias, and ethics, as well as privacy and copyright concerns.