The open source Moses SMT decoder, developed and maintained by universities, has grown into an elaborate ecosystem far beyond its academic origins. It’s now firmly rooted in the language industry and is branching out in original directions. Will this ecosystem develop sustainably into the future? Will there be a thousand MT systems based on Moses? In this article we examine the latest developments and future directions.
A few months ago I reported on some of the wasteful development duplication now taking place in the market. Some of this should be alleviated when phase one of the Moses for Localization (M4Loc) project goes live next month, providing the TMX and XLIFF support needed to integrate Moses into translation workflows, and ensuring we don’t all have to replicate the same developments.
We know Moses works best in scenarios where TM systems work best – large existing translation resources, mainstream language pairs and narrow domains. Combined with postediting, it has proven to be more productive than TM alone. You can see a few use case examples via this link.
Translation tool providers have begun to adapt Moses for use in the language industry in a variety of ways. Moses is behind end-to-end Software as a Service (SaaS) solutions from providers like Pangeanic, Applied Language Solutions and Logrus (and quite a few more that don’t advertise the use of Moses). These SaaS solutions allow the users to translate large volumes of text across many language pairs, with somewhat limited customizability and per-usage fees. The recent TAUS report “Managers Guide to Implementing Open Source MT” described this use as the “mass publication use case” and likened it to the equivalent of Kinkos or Alpha Graphics in the printing world.
There is also the “group use case”, likened in the TAUS report to internal printing departments in large companies or small enterprises or LSPs with integrated document preparation, workflow, translation and document reconstruction in small and medium volumes for one or more languages.
One note worthy offering, as it potentially makes this group use case easier to undertake, is the open source DoMY Community Edition from Precision Translation Tools, which provides easy installation of Moses and all associated components including CorpusFilterGraph – a framework to automate training and translation tasks.
The development of the DoMY and M4Loc projects is supported by standard open source business models of services, support and premium versions.
There’s also a trend towards CAT and TMS tools being adapted to enable postediting. This is where machine translations are retrieved from MT systems via APIs and the translation environment is enhanced with tools supporting the postediting task. We can expect a few Moses reliant/friendly offerings to be available soon.
The Moses-for-Mere-Mortals project should also be mentioned here. It started as an ambitious initiative to provide an easily installable version of Moses, tying together data preparation on Windows with translation on a Linux machine for individual users. After an initial release, the code has been transferred into the main Moses code repository, however active development has been suspended. Users will still have to wait a while until an easy-to-install-and-use version of Moses is available for Windows. Once this takes becomes possible, we can expect what the TAUS report describes as the MT “individual” use case (analogously the desktop printer version) could start to take off.
Get more from Moses
The joint TAUS and EuroMatrixPlus Moses Users Survey is still running. This research is allowing users to share their experiences, suggest solutions, and new ideas to the EuroMatrixPlus consortium, who are responsible for developing and supporting Moses. If you are a Moses user and haven’t already responded, I recommend you do as it is a rare opportunity for industry users to collectively guide development. This is the survey link.
In its academic role, Moses serves as a sharing platform for a wide variety of MT research. It is characteristic for MT to follow larger approaches – RBMT, EBMT and SMT. Subsequently, the larger approach is adapted to specific language pairs, linguistic requirements and different sets of data in order to address specific weaknesses of the larger approach. Moses is the sharing platform for SMT on which this research is built.
One recent development in SMT has been tree-based models which aim to improve translation quality using a form of overarching sentence structure. Tree-based models were added to Moses in 2009. Fortunately Moses does not only serve as a pool for the sharing of academic ideas, but also for more practical improvements in usage and performance. The biggest improvement for the language industry has been the addition of the new language model KenLM which is fast, memory-efficient, and above all, allows the use of multi-core processors under the open source license.
In addition to the code, the Moses team provides great technical documentation for the use of the new features and excellent support through the Moses mailing list. With its open community approach and its broad basis of support, Moses is likely to stay vital as a sharing platform for conceptual and practical innovations in the SMT area.
By funding academia, government organizations have long been contributors to Moses development. With rising needs of multi-lingual societies, governments have also recognized Moses as tool that can more directly serve these needs. This is especially true in the European Union, where Moses is part of numerous initiatives that address the needs of multi-lingual societies. The advantage of government initiatives is that they often include local languages that are not focus of broader academic or industry interest. In terms of growing the open source MT ecosystem, it is encouraging to see that resources and tools developed by government initiatives are often released under open source licenses.
Taken together, developments in industry, academia and government point to a very positive direction for Moses over the next couple of years. If each of the Moses ecosystem partners work together and contribute what they do best – academia innovating to improve MT quality, governments funding initiatives for smaller languages and industry integrating the different components together – Moses will soon produce faster MT, better MT, for more language pairs, and with better integration.
Widespread adoption of open source SMT is too big a task for any single organization to foster on its own – only the collaboration of many contributors can ultimately make it happen. The challenge that remains is to foster and coordinate these efforts world-wide, not just in the academic community as is already happening, but also in the industry and in emerging markets. Over-time, I’m hopeful this will happen. I’d love for a thousand Moses systems to bloom, and am quite confident that with the various use case scenarios becoming easier overtime, many more than a thousand will blossom.