Corona Datasets Used by Google, Naver Labs and University of Catalonia
icons-action-calendar12 Feb 2021
10 minute read
Learn how the TAUS corona crisis datasets helped three distinct organizations overcome the challenge of finding sufficient, domain-specific data in a time of global crisis.

In an effort to help battle the corona crisis from a language and information access perspective, TAUS coordinated an industry collaboration effort to gather translation memories covering this domain.

The result was six datasets containing a total of 3,403,681 segments in the following language pairs: English-French, English-German, English-Spanish, English-Italian, English-Russian, and English-Chinese.

To address the information availability in the time of crisis with the help of sufficient, in-domain language data here we present 3 use cases by Google, Naver Labs and University of Catalonia. Each with their unique studies based on the TAUS Corona Datasets share their detailed results.



Şölen is the Head of Digital Marketing at TAUS where she leads digital growth strategies with a focus on generating compelling results via search engine optimization, effective inbound content and social media with over seven years of experience in related fields. She holds BAs in Translation Studies and Brand Communication from Istanbul University in addition to an MA in European Studies: Identity and Integration from the University of Amsterdam. After gaining experience as a transcreator for marketing content, she worked in business development for a mobile app and content marketing before joining TAUS in 2017. She believes in keeping up with modern digital trends and the power of engaging content. She also writes regularly for the TAUS Blog/Reports and manages several social media accounts she created on topics of personal interest with over 100K followers.

Related Articles
icons-action-calendar1 Feb 2022

TAUS provided 172.980 segments of training data in French-German language pair, in a very specific area of the broadly legal domain for Custom MT, one of the latest and leading MT services companies delivering affordable machine translation engine training, evaluation, and integration.

icons-action-calendar19 Jan 2022

Online machine translation engines provide easy access to high-quality machine translations. They are optimized for content like news articles and social media posts that users of online platforms frequently translate.

icons-action-calendar22 Jun 2021

Finding high-quality data for MT training has always been a challenge on the path to generating high-performing MT output. The challenge increases when the language pairs are rare or when training data in a lesser-known domain is needed.