Want to ride the machine translation tidal wave? Machine translation has arrived for good in the language industry; the influential Global Watchtower blog even called it a tidal wave earlier this year. TAUS’ recent article on Facebook, Google, IBM and Microsoft ended by promising to look at the issues highlighted by the team’s analysis. Their assessment is that we need shared services and resources in 21st century translation to grow from a $15 billion industry to a $70 billion one. I pick up the baton by proposing companies combine investments to effectively leverage open source SMT for localization.

While research in machine translation has been going on for over 50 years, recently adopted statistical methods make it easier to create customized MT systems for specific language pairs and domains. By leveraging existing translation assets, translation providers and buyers can build well-performing MT systems that, combined with post-editing, are more productive than traditional translation methods alone. This is beginning to lead to wide-spread adoption of statistical MT (SMT) systems throughout the industry.

History Repeated?

Just over a decade ago, the localization industry faced a similar technological shift – the adoption of translation memory, translation and globalization management systems (TM/TMS/GMS). To project the adoption of SMT it is instructive to look at the adoption path of TM/TMS/GMS systems.

TM/TMS/GMS systems have now been adopted in all scenarios where they can practically be used, and their adoption has yielded significant economic benefits. This adoption has not been without obstacles – many TM/TMS/GMS systems are offered by larger translation providers instead of independent technology providers. This has led to market fragmentation and even compelled some competitors to build their own in-house solutions.

Technology acquisitions by well-capitalized translation providers and the move to cloud-based solutions with more opportunities for control maintain this uneasy balance. Only recently have some TM/TMS/GMS open source solutions emerged that can be considered serious contenders for wide-spread use. Will this adoption pattern be repeated for SMT systems?

Thanks to academic research in this area, there are already various open source MT systems available. Currently many commercial MT system providers are still independent, but consolidation is already starting to happen as exhibited by the recent acquisition of Language Weaver by SDL and the partnership announcement by Acrolinx, Asia Online, Clay Tablet and Milengo.

Partnerships and straight-out mergers between MT and localization providers provide the market with end-to-end MT solutions, often integrating available open source MT systems. End-to-end MT solutions like these will undoubtedly address a large part of the market.

But some users will require the independence and flexibility to build custom MT systems, integrate the systems tightly into their processes and generally adapt systems to their needs. This flexibility is especially important given the wide variety of TM/TMS/GMS systems available. Open source MT systems like the popular SMT system Moses provide a good basis for this adaptation.

Gaps in Open Source SMT

Due to its origin in the academic world, the Moses MT system has some gaps that need to be addressed to meet the needs of the language industry. Companies that are already using Moses have all duplicated investments to overcome the same issues.

Moses installation and operation at this point is a decidedly manual process that requires in-depth knowledge of different components and operation from the command line. Apart from these operational issues, which can be overcome by knowledgeable staff, the biggest gap is the integration of Moses in existing localization workflows. This includes the handling of inline formatting, support for industry standards like TMX/SRX/XLIFF, and integration into TM/TMS/GMS workflows.


It is important to note that in comparison to developing a whole SMT system from scratch, these gaps are relatively small and could be addressed with open source code as well.

The Moses for Localization Project

A collaborative effort to fill the gaps with open source code has several significant advantages for the industry. In addition to the desired flexibility for integration and interoperability, open source solutions provide vendor independence, lower costs than in-house development, and a community for continued support and development. Even if an MT user decides to purchase a proprietary MT solution, open source can offer a second source alternative.

The immediate need of integrating Moses into localization workflows and the benefits of an open source solution prompted Digital Silk Road to start the Moses for Localization open source effort. We are starting out with a small collection of useful tools and a discussion forum to build a community open to all. After tackling the integration into localization workflows it is up to the community to decide the further development direction, with improving usability and supporting additional languages as possible next steps.

In the spirit of a shared community and to enable commercial use the project uses the Apache License, Version 2.0 for all software code and documentation. This license creates a truly level playing field for all participants and allows for the unencumbered use of the code for any kind of purpose.

Moses for Localization is intended to supplement rather than supplant efforts coming out of the academic community, to ensure that continued academic innovation can be used by the localization community. The focus of the project is the Moses MT component itself, not steps in an MT tool chain required before (like corpus cleaning) or after (like postediting).

There are opportunities for everyone to contribute, individuals and large organizations, technical, linguistics and business oriented participants. Options for participation include everything from reporting bugs and providing language expertise, over writing documentation and helping other community members, to contributing code and providing funding. Help to take the Moses MT system from the academic high ground to the commercial real world. Sign up at Moses for Localization to join the conversation.

Related resources

How to implement open source machine translation solutions

How to implement open source machine translation solutions

  • No comments found
Add comment