Translation Data
No data, or not the right data?
We collect & generate the right training datasets for you.

Are you building MT engines and wanting to expand into more languages and improve the quality for specific domains? You will need more translation data. 

Does your legacy data match the style of your automatic multilingual chatbots? If not,  you will need to source new translation data that fit better with the more informal tone.

Domain-specific datasets
TAUS offers a large stock of domain-specific translation datasets in hundreds of language pairs to get you started right away with the customization and training of your engines. And if we don’t have it, we’ll create it for you.
Low-resource languages
If you need translation data in under-resourced languages, such as Bashkir, Xhosa, and many more, we can help you. TAUS has the communities in almost every country set up on our Human Language Project platform to create new translation datasets.
New domains
If you expand into new domains or you don’t have relevant content, TAUS can provide the source content for the creation of new translation data.
Virtual assistants
If you are building new applications such as chatbots and digital assistants, translation memory data may not be the best training data. TAUS can help you with instructions-based data creation.