In 2013 TAUS published its “Translation Technology Landscape Report” with the aim of helping beginners understand the uses for different types of translation technology and make informed decisions. This present report on translation data is a natural sequel to that technology-focused report.


Translation automation is fueled by data – language data in a generic sense, but essentially bilingual data that we call here translation data. In keeping with the standard Information Technology (IT) paradigm of data and algorithms, translation automation uses algorithms (e.g. Statistical Machine Translation (SMT) systems) that reuse existing translations and other textual content (the data) to train a machine translation engine to output new translated material.

To give an idea of the economic strength of this industry, the global market for outsourced language services and technology is estimated to reach US$38.16 billion in 2015, according to the Common Sense Advisory “The Language Services Market: 2015 Report” and demand for language services and supporting technologies is growing at an annual rate of 6.46%. The Machine Translation (MT) market alone according to the 2014 “TAUS Machine Translation Market Report” was estimated to be worth €250 million. This market is largely articulated around technology and services capable of repurposing and reusing existing translation data. 

This report attempts to describe the current state of affairs in the way translation data is used and to identify the opportunities and challenges for the next five years or so in terms of a data “marketplace”. Now that data has taken on such a strategic role in innovating translation automation, there will be a urry of new practical questions about the link between translation data and translation (and related) practices:

  • Who are the producers and consumers of translation data? How are they changing?
  • Is there a viable “market” for translation data, beyond the current informal sharing or web- scraping model?
  • What can we do to overcome the legal/technical issues and concerns regarding translation data sharing?
  • How could translation data sharing as a natural practice integrate with the European Digital Single Market program?
  • Which models of translation data circulation work best? For how long? What could disrupt them?

In this report we attempt to address these questions by drawing on input from practitioners around the world on how they conceive of the immediate future of translation data as an operational asset.

Authors: Andrew Joscelyne and Anna Samiotou

Reports Search