TAUS MT Training Datasets Now Available on AWS Marketplace

2 minute read

TAUS lists datasets for MT customization and offers for data creation, data annotation, and relevant NLP services for AWS customers through the AWS Marketplace.

Amsterdam, February 16, 2022 - TAUS, the one-stop language data shop, is pleased to announce that TAUS data products and data services are now available to users of Amazon Translate on the AWS Marketplace. This first step marks the beginning of continuous collaboration between TAUS and Amazon Translate. As of today, AWS customers can review and buy 30 bilingual corpora in the Ecommerce, Medical/pharmaceutical and Finance domains. The languages available are as follows:

  • 9 datasets in the Retail & Wholesale Distribution/E-Commerce domain in the following language pairs: English (US) to Danish, Dutch, French, Finnish, German, Italian, Polish, Spanish, and Swedish. 
  • 17 datasets in the Pharmaceuticals & Biotechnology domain in the following language pairs: English (US) to Bulgarian, Czech, Danish, German, Greek, Spanish, Estonian, Finnish, French, Hungarian, Italian, Latvian, Dutch, Norwegian, Slovenian, and Swedish.
  • 4 datasets in the Financial Services domain in the following language pairs: English (US) to Czech, Hungarian, Dutch and Romanian.  

Visit the TAUS kiosk on AWS Marketplace. In addition, TAUS has also listed offers for data creation and data annotation and relevant NLP services to AWS customers through the AWS Marketplace.

Domain-specific datasets are very useful for companies that need to customize MT engines. All of these new corpora have been evaluated by Polyglot Technology LLC as an objective third-party MT training expert. “The customization of Amazon Translate with TAUS Data improved the BLEU score measured on the test sets by more than 6 BLEU points on average and 2 BLEU points at a minimum. These are significant improvements that demonstrate the superiority of this customized Amazon Translation Active Custom Translation for the Ecommerce, Medical/Pharma and Financial domain over non-customized Amazon Translate,” says Achim Ruopp, Owner at Polyglot Technology LLC. 

The full report on the customization of Amazon Translate with the TAUS datasets can be accessed here.

TAUS has also completed integration between Amazon Translate and the TAUS Data Marketplace. TAUS now provides Data Enhanced MT (DEMT) service, a new layer on top of Amazon Translate, enabling a customized MT service that is offered on a usage basis through the TAUS Data Marketplace.

“As AI-enabled translation becomes more and more mainstream, the quality of the language data powering the MT models takes on high importance,” says Jaap van der Meer, CEO at TAUS. “This collaboration with AWS allows TAUS to reach a much bigger audience with our data and data services”.

About TAUS

TAUS was founded in 2005 as a think tank with a mission to automate and innovate translation. Ideas transformed into actions. TAUS has become the one-stop language data shop, established through deep knowledge of the language industry, globally sourced community talent, and in-house NLP expertise. We create and enhance language data for the training of better, human-informed AI services.

Our mission today is to empower global enterprises and their service and technology providers with data solutions that help them to communicate in all languages, faster, better, and more efficiently.

For more information, visit https://www.taus.net/


Şölen is the Head of Digital Marketing at TAUS where she leads digital growth strategies with a focus on generating compelling results via search engine optimization, effective inbound content and social media with over seven years of experience in related fields. She holds BAs in Translation Studies and Brand Communication from Istanbul University in addition to an MA in European Studies: Identity and Integration from the University of Amsterdam. After gaining experience as a transcreator for marketing content, she worked in business development for a mobile app and content marketing before joining TAUS in 2017. She believes in keeping up with modern digital trends and the power of engaging content. She also writes regularly for the TAUS Blog/Reports and manages several social media accounts she created on topics of personal interest with over 100K followers.

Related Articles
Discover efficiency and cost-savings with TAUS's production launch of the DeMT™ Estimate API.
Metalinguist, a newcomer to the Language AI space, wins the AI Revolution Readiness Contest 2023.
With the release of memoQ 10.1, Estimate API becomes available for anyone using memoQ to manage their content workflows.