TAUS Estimate API as the Ultimate Risk Management Solution for a Global Technology Corporation
Our client is one of the largest technology companies in the world. Based on examples of texts from the client, TAUS generated a large dataset and customized a quality prediction model. The accuracy rate achieved was 85%.
Speech Data Collection to Increase Performance & Diversity in Voice-based AI Systems
For a multinational technology corporation, TAUS curated a diverse team of workers who created over 1,400 hours of speech data in English (GB) in nine specific dialects with no recurring submissions from one person. Quality in speech data is tightly related to the diversity of accents and demographics of the community that provides the data. That’s where the TAUS Human Language Project Platform can help.
Enabling 15% Increase in Number of Perfect Translations for ING Hubs Poland
Our client is ING Hubs Poland, a leading multinational banking and financial services corporation. The TAUS datasets improved the number of translations rated perfect by human testers by 15% and the output from the engine trained with TAUS datasets will be better than the untrained 95% of the time in Anti Money Laundering (AML) and Human Resources (HR) domains.
Customizing MT in a Narrow Domain with 19% Quality Improvement
TAUS provided 172.980 segments of training data in FR-DE language pair, in a very specific area of the broadly legal domain for Custom MT, one of the latest and leading MT services companies. Custom MT measured a 19% increase (+7.23 BLEU points) in the output for the French-German language pair.
Customization of Amazon Active Custom Translate with TAUS Data
Polyglot Technology LLC independently evaluated the quality of machine translation output from Amazon Translate customized with TAUS Data compared to non-customized. The customization of Amazon Translate with TAUS Data always improved the BLEU score measured on the test sets by more than 6 BLEU points on average and 2 BLEU points at a minimum.
Improving Adaptive MT Outputs by 22% BLEU Scores Across Five Languages
TAUS provided Pangeanic 1.8 million words of MT training data in English to Spanish, German, Polish, Russian, and Chinese language pairs. Using the data provided by TAUS, Pangeanic built COVID-19 domain-specific NMT models. On average 22% BLEU score improvement was achieved with 50% increase in English - Russian language pair.
Data Annotation to Optimize Searchability in E-Commerce
For our client, a multinational e-commerce corporation, a community of 200+ TAUS contributors was formed based on their product affinity in various product categories, ranging from make-up to collectible coins to annotate data in several European languages. The annotated data was to be used in training client's high-tech ML system to optimize webshop functionalities such as searchability.
Domain-Specific Training Data Generation for SYSTRAN
SYSTRAN, a leading AI-based translation technology company, partnered with TAUS to use these datasets to produce twelve translation models. After the training with the TAUS Corona datasets, the SYSTRAN engines improved on average 18% across all twelve language pairs compared to the SYSTRAN baseline engines.
Corona Datasets Used by Google, Naver Labs and University of Catalonia
Google, Naver Labs and University of Catalonia used the corona specific datasets provided by TAUS to build MT models. Six datasets containing a total of 3,403,681 segments were provided by TAUS as a part of the industry collaboration effort initiated by TAUS.