TAUS Data Cloud is a neutral and secure repository platform for sharing, pooling and leveraging language data. It is a data cloud for the global translation community supporting translation automation efforts, improving translation quality and fueling business innovation. The Data Cloud addresses the shortage of available in-domain parallel data from the industry. The TAUS Data Cloud team continuously evolves its industry-shared services to support, satisfy and enable TAUS members and users to gain competitive advantage.

TAUS's vision on language and translation data is the Human Language Project, inspired by the Human Genome Project, a cross-continental business-government-academia collaborate effort aimed at uncovering the human DNA. Through this concerted effort, scientists completed the sequencing of all 3 billion base pairs that comprise the DNA in a human body in 2003. Exactly 50 years after James Watson and Francis Crick first described the double helix structure of DNA. Success was only made possible by the sharing of data/pieces of discovery that began in the 1990s and accelerated after agreement on the Bermuda Principles for data sharing in 1996. The Human Genome Project has already fueled the discovery of more than 1,800 disease genes and the affordable sequencing of an individual’s DNA for medical purposes is within sight.

TAUS Data Cloud envisages the potential for a similar level of discovery and impact - the potential to dramatically reduce the extent to which language is a barrier to communication. We envision that the Human Language Project will have an exponential effect on the language services sector, trade and business in general, while pushing the evolution of human civilization to a much higher level of understanding, education and discovery.

The Human Language Project will be an open platform of language resources and tools, consisting of at least (and maybe more):

  1. Fearless sharing of language and translation data (speech and text) in all languages and language pairs, not hindered by outdated copyright law. European legislators must modernize copyright regulations on translation data.
  2. A library of translation, language and reordering models covering all languages and a wide scope of domains to help fast-track and fine-tune the development and customization of machine translation engines.
  3. A translation quality evaluation platform to help assess, benchmark and predict the right translation quality for different content types and different purposes of communication.
  4. A library of language tools – such as parsers, chunkers, lemmatizers, taggers – to assist service and technology providers to improve and customize their solutions.
  5. Common translation web services API’s to ensure that all services and technologies work seamlessly together.

The Human Language Project is intended to be a global collaboration between business, government, academia and individuals with the goal of making language data and technology accessible to all stakeholders in the world. It will be instrumental and crucial for:

  • The economic growth of nations and communities with under-resourced languages
  • Preserving cultural heritage of under-resourced language communities
  • The international trade and growth of the world economy
  • Supporting many UN and NGO programs and institutions in securing and protecting health, peace and welfare around the world
  • Growing the global translation industry

We invite all stakeholders to join TAUS Data Cloud for the benefit of their business, their customers, commerce and society.  

Relevant links to TAUS resources about the Human Language Project:

Session#11: Human Language Project, TAUS Annual Conference 2014 “Together we know more”

The Call for the Human Language Project (2014) 

It’s Time for a Big Idea: the Human Language Project (2013)

The Human Language Project: Inventing the Future of Translation Data (2012)

Founding Members

Founding Members



The digital age has led to insatiable demand for translation services that cannot be met with existing proprietary business models and the capacity of around 300,000 professional translators worldwide. In 2010, Google’s computers produced ten times more translated words than the entire professional translation workforce worldwide. By 2020 there will probably be more text written by machines than by humans.

Early in 2007, the Translation Automation User Society (TAUS), a global community of 100 organizations, began to determine the industry-wide tools needed to meet ever-increasing commercial and societal demand for translation. The TAUS community soon concluded a shared industry repository of translation data was needed as a fundamental building block to support future growth and innovation.

In the summer of 2008, the TAUS Data Association (TDA) was founded with the backing of 45 international companies to provide just such a giant multilingual database for the benefit of all. Prior to launch, over 200 companies and organizations across industry sectors provided guidance on the business model through TAUS meetings, surveys and one-to-one consultations. A Global Steering Committee, with 18 representatives from multinationals and public bodies, agreed the final operating model, mapped the benefits to clients, providers, end users and technology firms and undertook ROI calculations.

Since 2008, the TAUS Data repository platform has grown immensely in volume and language pairs. To our knowledge, it is the largest industry-shared data repository platform available. TAUS data is structured into 17 industry categories (domains) and 9 content types. New releases offer improved, state-of-the-art data functionalities. Since  September 2015’s release, TAUS Data is called TAUS Data Cloud.

The TAUS Data Cloud team is committed to continuous improvement to support members to gain competitive advantage, improve quality of service and re-engineer the industry's business models to meet demand.