Case Study

DeMT™ Estimate API enables saving up to 76% in lead time and costs

Yamagata Europe, a leading language service provider, partnered with TAUS to streamline their translation process for a major automotive client. By implementing a customized Machine Translation Quality Estimation (MTQE) model, they achieved remarkable results, reducing post-editing (PE) efforts by up to 76% and gaining valuable insights into translation quality, saving time and costs.
Ready to get started?
The Client

Yamagata Europe

Yamagata Europe is a language service provider based in Belgium. They are part of the Yamagata Group, the first Japanese printing company with a focus on non-Japanese print, established in 1906. Yamagata Europe is a team of over 50 content specialists who support global organizations to connect with the end users of their products, turning complex enterprise content needs into accessible and understandable solutions.
The Challenge
Yamagata Europe is working with a large automotive company to translate the logged warranty claims from nine European languages into English. The existing workflow involves initial data cleaning of the source strings, followed by machine translation, and then final refinement and post-editing of the translated content. For this pilot project, Yamagata Europe selected two of the nine languages, namely FR>EN and DE>EN. They process on average 742 words/day and 249 segments/day for DE>EN and 1165 words/day and 166 segments/day for FR>EN for this particular customer. Especially for FR>EN, Yamagata spends a lot of time cleaning up after the MT round. Having reliable MTQE scores added to the machine-translated segments would help Yamagata to reduce the time and money spent on the post-MT cleaning/editing round.
The Solution

Yamagata provided TAUS with in-domain datasets and translation memories, and around 200 segments of annotated (good and bad examples of) translations in both language combinations. The custom model creation involved the following steps:

- Cleaning of the provided datasets to identify high quality translation segments.

- Generation of additional synthetic data - here TAUS created paraphrases (similar sentences that should be scored close to original examples) and perturbations (changing specific outputs of the sentence, to get examples that should be scored low). The team also scored all examples, interpolating scores for paraphrases and perturbations, to be able to provide a score for training.

- Experiments using different portions of the training dataset to fine-tune the model, to get the lowest possible error rate on the test set.

Yamagata required the MTQE model to provide a binary categorization of good (do not require post-editing) and bad (require light post-editing) translations. The customized MTQE model is fine-tuned with distinct thresholds for the two language pairs in order to minimize the classification error rate. As a result, a score of 0.75 or above is considered 'Good' for DE>EN, whereas 0.85 is considered 'Good' for FR>EN.

The Results
Yamagata tested the MTQE model by scoring batches of machine-translated strings. In both language pairs, the majority of the strings were scored as Good (see the table below), which let these strings be finalized without further revision.
By skipping the PE step on the strings classified as Good, the pilot project demonstrated staggering savings on lead time and cost: 65% for the FR-EN language pair and 76% for the DE-EN language pair. It enabled them to gain more insights into which segments need extra attention and which can go straight to publication.
During this pilot, it wasn’t Yamagata’s primary goal to measure the exact accuracy of the MTQE scoring, however, by and large it was considered to be high enough to identify the strings that don’t require further revision. Jourik Ciesielski, Chief Technology Officer at Yamagata Europe, says: “We estimate that 30% of the translations labeled as ‘Bad’ by the TAUS QE model are good enough and can be excluded from light post-editing too. There can be several reasons for this, some strings are for example too short for the model to produce good scores. On the other hand, since we’re working with user-generated content, we often receive heavily polluted strings the model hasn’t seen yet. Nonetheless, we expect an increase in the average accuracy of the QE scoring through further fine-tuning of the model.”
Let's connect

Talk to our NLP Experts to find out how you can minimize your post-editing efforts in time and money with a customized Quality Estimation model.

Discover more Case Studies

TAUS Estimate API as the Ultimate Risk Management Solution for a Global Technology Corporation

Based on examples of texts from one of the largest technology companies in the world, TAUS generated a large dataset and customized a quality prediction model. The accuracy rate achieved was 85%.

Domain-Specific Training Data Generation for SYSTRAN

After the training with TAUS datasets in the pandemic domain, the SYSTRAN engines improved on average by 18% across all twelve language pairs compared to the baseline engines.

Customization of Amazon Active Custom Translate with TAUS Data

The customization of Amazon Translate with TAUS Data always improved the BLEU score measured on the test sets by more than 6 BLEU points on average and 2 BLEU points at a minimum.