TAUS provided 172.980 segments of training data in French-German language pair, in a very specific area of the broadly legal domain for Custom MT, one of the latest and leading MT services companies delivering affordable machine translation engine training, evaluation, and integration.
Online machine translation engines provide easy access to high-quality machine translations. They are optimized for content like news articles and social media posts that users of online platforms frequently translate.
Finding high-quality data for MT training has always been a challenge on the path to generating high-performing MT output. The challenge increases when the language pairs are rare or when training data in a lesser-known domain is needed.
Data annotation is the categorization and labeling of data to be used in the training of AI applications. Training datasets must be carefully categorized and annotated for each specific use case. High-quality, human-powered data annotation allows companies to build and improve AI implementations which results in enhanced customer experience solutions such as product recommendations, relevant search engine results, computer vision, speech recognition, chatbots, and more.
When the global pandemic hit the world in 2020, TAUS created a starter kit in several languages to train high-quality translation models customized for the pandemic domain. SYSTRAN, a leading AI-based translation technology company, partnered with TAUS to use these datasets to produce twelve translation models for English to/from French, Spanish, German, Italian, Chinese and Russian and make them available on SYSTRAN Marketplace where NMT models are offered to a network of language experts to train models in any language pair and domain.
In an effort to help battle the corona crisis from a language and information access perspective, TAUS coordinated an industry collaboration effort to gather translation memories covering this domain.