icons-social-media-facebook-circleicons-social-media-twitter-circleicons-social-media-linked-in-circle
Customization of Amazon Active Custom Translate with TAUS Data
icons-action-calendar19 Jan 2022
15 minute read
An independent BLEU score analysis on customization of Amazon Active Custom Translate with domain-specific TAUS datasets.

Online machine translation engines provide easy access to high-quality machine translations. They are optimized for content like news articles and social media posts that users of online platforms frequently translate.

Businesses often want to translate text with a different style and a specific topic. For enterprise use, online machine translation engines offer customization via sets of pre-existing translations that reflect the desired style and topic. This data is often called “parallel data” and TAUS makes such customization data available through TAUS Data Marketplace and provides all relevant data processing services. 

Polyglot Technology LLC independently evaluated the quality of machine translation output from Amazon Translate customized with TAUS Data (using Amazon Translate Active Custom Translation) compared to non-customized Amazon Translate.

The customization of Amazon Translate with TAUS Data always improved the BLEU score measured on the test sets by more than 6 BLEU points on average and 2 BLEU points at a minimum.

These are significant improvements that demonstrate the superiority of this
customized Amazon Translation Active Custom Translation for the Ecommerce,
Medical/Pharma and Financial domain over non-customized Amazon Translate. 

The evaluated datasets are also used as a part of the TAUS Data-Enhanced Machine Translation (DEMT) service that offers an end-to-end solution to those who wish to produce customized MT output for their specific domains, without the hassle of going through the actual MT training process. With TAUS DEMT, BLEU score points are proven to increase by 15.3% on average in the Ecommerce, Medical and Financial domains. Try TAUS DEMT now!

 

customization-of-amazon-active-custom-translate-with-taus-data
Author
achim-ruopp

After graduating in computer science Achim Ruopp worked as a translator, opening his eyes to the myriad of possibilities of using computers in human language translation. He has been involved in enabling computers to process different languages and the translation business ever since. After deepening his knowledge with a master’s in computational linguistics he participated in a wide range of projects in machine translation research and practical integration of machine translation in the human translation process. Achim's goal in sharing his knowledge, experience and the latest developments in the field of machine translation is to break down barriers in cross-language communication. Polyglot Technology LLC helps customers succeed with machine translation by enabling them they make best use of data available to them, by assessing machine translation quality independent from MT vendors and by advising customers on how to best integrate the technology with people and processes.

Related Articles
icons-action-calendar1 Feb 2022

TAUS provided 172.980 segments of training data in French-German language pair, in a very specific area of the broadly legal domain for Custom MT, one of the latest and leading MT services companies delivering affordable machine translation engine training, evaluation, and integration.

icons-action-calendar22 Jun 2021

Finding high-quality data for MT training has always been a challenge on the path to generating high-performing MT output. The challenge increases when the language pairs are rare or when training data in a lesser-known domain is needed.

icons-action-calendar12 Apr 2021

Data annotation is the categorization and labeling of data to be used in the training of AI applications. Training datasets must be carefully categorized and annotated for each specific use case. High-quality, human-powered data annotation allows companies to build and improve AI implementations which results in enhanced customer experience solutions such as product recommendations, relevant search engine results, computer vision, speech recognition, chatbots, and more.