2nd July, 2012, Amsterdam – TAUS, the translation innovation think tank and interoperability watchdog, is pleased to announce that it has increased the volume of industry shared translation memories from 4 billion words in 380 language pairs to 50 billion words in 2,000 language pairs.
TAUS Data enables training and customization of automatic translation engines and it provides a free and fast reference for look-up of terms and phrases for professional and non-professional language workers.
“Automated translation is quickly becoming a standard requirement for buyers and providers of translation”, says Jaap van der Meer, director of TAUS. “Translation data are the fuel for every machine translation engine. Professionals in the translation industry need quality translation data to customize engines that translate much better than Google Translate. The more data, the better. TAUS provides for this need. It is the only industry-sanctioned repository of shared translation memories.”
The vast increase in volume and number of language pairs has been made possible by applying a matrix feature to the data repository. For each upload of translation memories from the same data owner the matrix feature looks for new language combinations if there is an exact match in source or target language.
“It is challenging to find enough good translation data in less common non-English language pairs”, says Rahzeb Choudhury, Operations Director of TAUS. “This is where the new matrix feature comes in, offering access to hundreds of millions of human quality translation data in language directions such as Chinese into and from Japanese, Russian, French, Brazilian-Portuguese and 1700 hundred plus other language pairs that until now have not been catered for.”
The matrix feature is automatically applied to all translation memories uploaded to the TAUS Data platform. If an organization uploads a translation memory from English into for instance French, German, Italian and Chinese – 1 million words into four languages – TAUS Data will then create all the possible other language combinations (16 in this example) and increase the volume to 20 million words. The benefits are that terms and phrases can now be looked up across all the different language combinations and data sets can be downloaded to train and customize machine translation engines for language pairs that before were not available.
Since its foundation in 2008 TAUS Data has helped many companies to improve the quality of their machine translation engines. Founding members include global IT companies, as well as large and small providers of translation services and technologies. Members agree to share their translation memories on an industry-wide scale to enhance translation efficiency and quality while retaining the full IP rights to their data. The data are free to use for anyone who is willing and able to share their own translation memories in return. Organizations that have no translation memories that they can share can buy data. The TAUS Data platform is funded through member fees and industry sponsorships.
TAUS is an innovation think tank and interoperability watchdog for the translation industry.
TAUS supports entrepreneurs and principals in the translation industry to share and define new strategies through a comprehensive range of events, publications and knowledge tools.