Case Study

Customizing MT in a Narrow Domain with 19% Quality Improvement

TAUS provided 172.980 segments of training data in French-German language pair, in a very specific area of the broadly legal domain for Custom MT, one of the latest and leading MT services companies delivering affordable machine translation engine training, evaluation, and integration. Custom MT measured a 19% increase (+7.23 BLEU points) in the output for the French-German language pair.
Ready to get started?
The Client


A machine translation training and implementation agency
Specialize in dataset acquizition and MT engine training on thirdparty technologies
The Challenge
Our client, Custom MT, is a Machine Translation services company that acquires datasets and trains MT engines on thirdparty technologies. For their client who did not have enough training material and whose translation domain was highly specific and narrow, finding a sufficient amount of training data covering the exact domain was a serious challenge.
Even though the broad domain could be classified as legal, Custom MT’s task was to provide their client with a machine translation engine trained for their specifically narrow domain.
The success of the training directly relies on the quality of the dataset that is used for the training. The golden rule is: better data in, better data out. Finding good quality data in broad domains such as legal is challenging as is, let alone a more narrow domain within the broad category.
The Solution
"Custom MT partnered with TAUS because of our expertise in high-quality domain-specific training data collection and creation with proven results.
The main goal was to increase the productivity of the translators by giving them a custom machine translation tool. We saw remarkable results after we trained the preferred engine with TAUS data. The main challenge was the domain, since translations were from a very specific legal area. Despite the rarity of the domain, TAUS ran their query and were able to produce an extensive training corpus in the required area. Linguists gave their approval of the data and we were able to successfully finish the project and increase the productivity of the translators by giving them the right tool."

Anastasia Lisina, Head of Production at Custom MT

The Results

19% Increase on BLEU scores

7.23 BLEU points increase

Using the training data provided by TAUS, Custom MT trained a domain-specific MT engine for their client. To make sure the provided datasets were the perfect fit for the specific domain they were first examined by the linguists. After the training, they used a blind test and calculated the editing distance of the output the trained engine produced. Custom MT measured a 19% increase (+7.23 BLEU points) in the output for the French-German language pair.
Custom MT uses an evaluation check list to assess the health of the training datasets. According to their internal assessment, overall health of the TAUS training datasets were scored at a 95.85%.

Evaluation Check-list of TM Input Quality

Customer: TAUS

Language pair: FR_DE

TM Volume after Cleanup, amount of segments: 165819

Overall TM Health: 95.85% good

Let's connect

Talk to our Data Experts to help you find the right type of data for your next project. Niche domains or rare languages? We have a large suite of services to generate your dataset.

Discover more Case Studies

TAUS Estimate API as the Ultimate Risk Management Solution for a Global Technology Corporation

Based on examples of texts from one of the largest technology companies in the world, TAUS generated a large dataset and customized a quality prediction model. The accuracy rate achieved was 85%.


Domain-Specific Training Data Generation for SYSTRAN

After the training with TAUS datasets in the pandemic domain, the SYSTRAN engines improved on average by 18% across all twelve language pairs compared to the baseline engines.

Customization of Amazon Active Custom Translate with TAUS Data

The customization of Amazon Translate with TAUS Data always improved the BLEU score measured on the test sets by more than 6 BLEU points on average and 2 BLEU points at a minimum.