Data or Engines? Battling for Dominance in the Emerging Translation Ecosystem

7 minute read

Translation automation landscape is changing fast. In this series, we'll ask machine translation experts their opinions on the challenging questions about the current and future status of translation ecosystem. This time we investigate who dominates: Data or MT Engines?

Welcome to a new series of TAUS Automation Landscape blogs. It will feature our takes on the rapidly evolving world of technology developments through interviews with leading translation tech suppliers and buyers.

In last year’s Nunc est Tempus briefing, we identified how the language industry was entering a period of exponential rather than linear growth. This is driven by a number of digital market forces and largely enabled by the rapid uptake of neural MT solutions.

Today, business models built around tech-centric translation platforms, a constant flow of disruptive MT research results, and changing ambitions among big tech companies all raise the question of how incumbents will need to adapt their strategies for the next wave of industry automation.

To address these issues in the coming months, TAUS will focus on issues impacting the core of translation organization, productivity and quality. Here are some of the key questions we shall be covering for you.

Data or Algorithms?

Both of these vital, interrelated components are constantly improving in quality and value. But which will play the more important role in the business plans of industry players?

Algorithms are now being designed, circulated and tested at an extraordinary speed, offering a variety of machine learning products to drive better automation. But at the same time, the availability and quality of data still depend on multiple ownership concerns, restricting the free flow of automation’s key ingredient.

Will data gradually be unbundled from algorithms, creating separate markets for MT engines and translation data? And if so, could there be a new role for a neutral quality evaluation service dedicated to MT engine metrics?

Consolidation or Fragmentation in MT Supply?

We shall be examining whether data ownership contributes to a more fragmented or a more consolidated industry. For example, LSPs who have already built up good data resources and have now developed skills in algorithms may well begin to encroach on the current MT supplier market and offer new services that encourage greater competitiveness. Alternatively, there may be other mechanisms at work that will tend to consolidate industry performance by more cost-efficient data ownership strategies.

Invaders or Insiders?

A second theme emerging from the rapid development of neural MT is how to make sense of the different types of MT provider now operating in the marketplace. Google, Microsoft and Amazon, for example, are moving into the translation service niche, inevitably competing against pure-play companies such as Systran, KantanMT, Iconic and DeepL.

How will clients differentiate the offerings of these invaders from their insider suppliers? And how will the insiders differentiate themselves? Will pure-plays manage to compete due to their capacity to customize and personalize engines for their clients? Or will the invaders take over this role as well?

CAT or MT?

As these big players are themselves platforms, what is the special value of big translation platforms in an industry still packed with thousands of individual LSPs fighting to protect their own client bases? And how will these same LSPs evolve now that their arrays of legacy CAT tools finally become redundant? Will the “invader” MT engines work to provide complete technology solutions across almost every market segment?

What kind of MT Innovations?

Finally, if one key effect of across-the-board automation is to sharpen competition by lowering production costs or accelerating access to the best algorithms and data, where can we expect to see the next round of innovation in an industry that has already seen rapid change driven by the arrival of machine learning in the last five years? One thing we can be sure of is that, as we said last year, it will almost certainly be “not business as usual” once again!

Intento: the First MT Evaluation API Service

We kick off this series with an interview with what may be the first in a new breed of technology service suppliers for the industry: - the Silicon Valley-based firm Intento and its API platform for MT engine evaluation and procurement.

Founded in 2016 by Konstantin Savenkov and Grigory Sapunov, Intento is specialized in a new breed of “cognitive AI services”, and began offering MT middleware solutions to the translation industry in 2017. CEO Savenkov talked to us about the company’s vision for the industry.

1_FWFoeq80tqilFPeNQD_E4QThe driving idea was to simplify the process of choosing and managing MT engines for user clients. “What makes the situation so complex for end-users is that there are so many possible language pairs across many domains and different use cases. We noted the friction these users experienced both when choosing engines and data, and also in the very wide range of prices demanded by different MT providers. This inevitably caused difficulties in how to evaluate vendor offerings. And increasingly, there were difficulties resulting from the many changes that vendors constantly introduced into their APIs over time.”

The VC-funded company therefore began to monitor MT engines and test the language pairs they offered. Between July 2018 and January 2019, for example, the found that the “best” language-pair vendor changed for 20 out of the 48 language pairs tracked by Intento. “We realized that there is no best vendor or model. Even just in the general domain using public data sets, there is a huge difference in quality across all combinations of language pairs.” Today the company provides access to 21 different MT engines via its APIs.

The company’s core product is MT middleware - a single API that can talk to all MT APIs. Intento also takes care of data conversion, error reporting, retries, and the different limitations of specific MT engines. It also partners with all MT vendors, and will on occasions resell their services. On top of that, it uses its middleware engineering to build universal connectors: “If you have five inhouse systems and need five different MT engines, you will need 25 connectors. This is the sort of cumbersome problem we help solve.”

“In future,” says Savenkov, “tech owners will be providing entire platforms and not just models, so we’ll be seeing thousands of models appear for every possible language pair. This means that our technology for evaluating and using model portfolios will become even more useful.”

Currently, the 12-person company has around 30 clients, both tech-savvy LSPs in the language market and other MT-user enterprises. It works with all kinds of connectors and use cases, and is now moving towards managing language models, for example when a company simultaneously uses Google and Microsoft and another service, all of which tend to create multiple hurdles.

In addition to language pair testing, Intento also provides a data-cleaning service. Savenkov notes that some LSPs use the MT platforms offered by large technology firms simply to do their data cleaning, without actually training any engines.

”This is useful as translation memories tend to lose around 50% of their content, so cleaning can gain considerable time in the data preparation process. Instead of depending on a team of linguists and data scientists to do the cleaning, you simply upload your content, and the training model is already fully adapted to your specific profile.”

Is there a risk of fragmentation in this market as different domains or sectors break out into specific MT solutions? Intento thinks that every market will have to undergo this kind of development curve, starting by fragmenting into different domains and later consolidating around a few players. “Data is of course invaluable in this entire process, but in the case of MT the data market grows fragmented due to privacy concerns,” says Savenkov. He observes that there is no significant leader that owns the largest number of “best” language pairs for MT, and he expects many more domain-specific engines to emerge.

“Most traditional MT vendors do not pay attention to domain behavior in translation. Today an engine will improve overall, but as the data changes, MT training does not tend to factor in domains. So in the mid-term perspective there will be more fragmentation and unbundling into niches in this industry, which is typical of growing markets. The question is, beyond fragmentation, will model training for many different customers eventually lead to some form of consolidation?”

Looking forward, Intento is now expanding beyond the transaction market with the launch of its AI Gateway - a universal platform to help a broader range of companies discover, evaluate and implement AI solutions for content processing. As Savenkov says, “we began with traditional text-to-text translation, and now we’re starting to work with OCR and transcription. We are seeing an interest in these tools from DTP or media companies who work with other types of non-text data.” A sign of the times for the global content industry as a whole?


Long-time European language technology journalist, consultant, analyst and adviser.

Related Articles
Explore the fascinating journey of Lisa Vasileva, a Machine Learning Engineer at TAUS, as she transitions from a professional translator to the field of Natural Language Processing (NLP).
The factors that impact the reconfiguration of the translation industry in the 2020s and emerging pricing and licensing models: The Owned, Public, Private, Hosted and Shared.
Looking into the future of the translation industry under seven sections where automated translation is no longer just a freebie on the internet, but entering the real economy of the translation sector, and it changes everything.