Automating Translation Quality Review. How is that Possible?

ChatGPT is causing a revolution in the translation industry. Everyone who had taken a wait-and-see position vis-à-vis MT in the last five years is now very likely going to give up resisting automatic translation. It’s too good to ignore, and everyone does it. New Large Language Models are coming out every week, and allegedly they are rapidly becoming better. But how do you know whether the translation quality is good? How do you find these infamous hallucinations, the funny or embarrassing errors that are typical for the machines that can’t really think like humans but that do an amazing job of generating fluent content and translations?

Second pair of eyes

Quality is a big selling point for every Language Service Provider. The way to guarantee the best translation quality to customers has always been a thorough quality review by a second person. Post-editing MT is essentially a modern version of this ‘second-pair-of-eyes’ kind of process: quality review and correction of machine-translated output.

There are two problems associated with this traditional approach to translation quality review. One is that it is predominantly an informal and ad hoc process that doesn’t help to prevent the same mistakes from happening again. Two is that it’s a very time-consuming and costly process and not scalable. In other words, we are not getting better over time, and cost and time increase linearly with the volumes we are handling.

Learning from mistakes

In 2018, TAUS introduced the DQF/MQM metric as a systematic new approach to managing translation quality. The idea behind this new approach was to move away from the informal and ad hoc quality process in the translation industry and replace it with a more data-driven approach. Systematically logging error types and severity levels helps us to learn, give useful feedback to translators or retrain the MT models.

DQF/MQM is now widely adopted as a translation quality standard. One problem solved perhaps. But as the translation volumes keep rising and the use of MT becomes more the rule than an exception it becomes practically impossible to run a human review on all translations.

Every word is a number

So if machines can translate for us, can we not also trust them to review the quality of the translations? This question must be asked of course… There is no other way we can solve the problem of scale. Research in this direction started already some ten years ago and it is referred to as Quality Estimation. Quality Estimation should not be confused with Quality Evaluation metrics such as BLEU, which is often used to grade MT engines. The problem with BLEU and other evaluation metrics is that they are also not scalable, because for every measurement a human reference translation needs to be produced.

Quality Estimation on the other hand can be applied to any translated text without reference translations and without humans-in-the-loop. Quality Estimation does the job fully automatically, blind you could say, and is therefore also known as Quality Prediction. How is this possible? Thanks to a breakthrough in the space of NLP that converts words and sentences into their numerical representations, aka embeddings. These numerical representations, other than text, can easily be compared and analyzed and that explains to a large extent the breakthroughs in AI and machine learning, such as ChatGPT and the Large Language Models.

TAUS EPIC API

TAUS is one of the first and few companies that have productized Quality Estimation. It was a natural thing to do for the TAUS team. First, because we have a solid history with translation quality evaluation through our work on DQF/MQM in the past five years. Second, because already since 2008 we started aggregating translation memory data and we now have one of the largest repositories of language data that has been used by most of the MT developers to train their MT engines over the past decade. In 2021, the TAUS NLP team started with the conversion of all its language data into embeddings. This made us totally ready to build a Quality Prediction model when one of our customers, a big tech company, asked us to do so in 2022.

The TAUS EPIC API is trained on the rich repository of TAUS data. Custom-specific models can be trained with input of reference data from the users of the EPIC API. The goal for each model and each customer is to evaluate the quality of translations with an accuracy level of at least 85%. The TAUS EPIC API can easily be integrated in customer workflows and in TMS and CAT environments. Integrations are already available for memoQ and Blackbird.

What are the gains

Quality Estimation or Prediction is the second hottest product (after Machine Translation) in the translation ecosystem these days. As more and more companies are relying on MT for the translation of most of their content and are even adopting an MT-First strategy, they will also have to rely on more efficient ways of managing the quality review process. The use cases and associated gains are different for different customers.

Risk management

The first obvious use case is risk management: identify and filter out errors and low-quality machine translations before they get published. Quality Estimation in this scenario helps enterprises to avoid errors and damage to the brand image of the company.

Productivity and efficiency gains

A big motivation for implementing Quality Estimation for LSPs and enterprises with in-house managed translation services is obviously the gains in efficiency and productivity. Today, translation operators pass all the MT output on to translators and post-editors for review and correction. Fifty percent or more of the MT output may actually be good enough to be published, but how would you know which segments are good to go and which ones are not. Quality Estimation helps to reduce the workload of the translator workforce. This can easily ‘translate’ into savings of 20% or more.

MT engine selection

Quality Estimation can also be used to compare the quality of output from different MT engines and to automatically select the best quality translation on a segment-by-segment basis. This will also result in savings on post-editing cost and time and also reaching overall better translation quality.

Data cleaning

Many translation operators will discover that the translation memories that they have been accumulating in the past decade or longer are not in the right shape or form for the current best practices of machine learning and NLP. The TAUS team can help them with the conversion of the legacy data to vectors and then apply the EPIC API to automatically clean the data. The benefit here is that translation operators will be able to get the maximum value out of their own data for the customization of their MT engines and for Quality Estimation.

To learn more, don't miss the TAUS webinar about Automating Translation Quality Review with EPIC API featuring use case presentations by Uber, Yamagata, and MotionPoint.