Manager’s Guide to Implementing Open Source SMTTen years ago, the millennium brought powerful computers at affordable prices and advances in statistical pattern-matching algorithms. The advent of these foundation technologies heralded a new age for statistical machine translation (SMT). In this new era, translation users could expect more natural machine translation output than from the soon-to-be-legacy rules-based machine translation (RbMT) technology, and with less overhead to maintain the overall machine translation cycle.

During the last five years, translation services sector analysts have been abuzz with the potential virtues of automating translation with SMT output. To date, reports have focused on SMT’s technology, translation quality requirements, and the threats to the human translator workforce.

In the last year, there has been a perceivable yet difficult to quantify shift. The localization community is no longer rejecting MT solutions sight-unseen. A top-down view shows vendors are quietly yet strategically integrating MT features into mainstay translation management systems. A bottom-up view reveals professional translators who use free SMT systems as a tactical advantage that supplements their individual work.

So, behind the smokescreen of analytical rhetoric and posturing, translation players at all levels are increasing their exploration, if not actual use, of MT systems. More users have built their own solutions that address the problems of technology, quality and workforce. Through this experience, the fear of the unknown is fading into an understanding of the appropriate uses and limitations of the new technology.

As SMT matures from a bleeding edge technology to a fundamental tool, new questions are emerging. What constitutes an SMT solution? How does an organization implement an SMT solution? What are the resource requirements and milestones? What can a company expect as a return on the investment?

This Manager’s Guide walks through scenarios designed to answer these questions. The guide presents an SMT implementation as an exercise in Business Process Optimization (BPO) with two components, the technical component (SMT) and the automated workflow components with Post-editing (PE), that work together to optimize the translation business process.

Please note that this guide used the Moses Decoder open source project an example because it is the most mature and widely used system to date. Other SMT projects, such as the Joshua project from Johns Hopkins University and the cdec project from the University of Maryland are not far behind Moses.

This guide looks at SMT deployment in terms of the phases of a traditional project cycle. The Requirements Definition phase lists operational objectives that have been successfully addressed with machine translation deployments. The Requirements Analysis phase validates the requirements and identifies points to consider when analyzing the feasibility of implementing a machine translation. The System Design phase identifies resources and milestones in the form of a work breakdown structure. The Modeling phase builds a small scale system and business processes to determine the feasibility. The Execution phase implements a full-scale system. Monitoring and Optimization implement full-scale post-editing for overall system quality and productivity optimization.

Project Cycle
Requirements Definition Identify the problems that need solutions, define overall goals
Requirements Analysis Validate needs, analyze feasibility
System Design Plan and design system and processes to satisfy requirements
Feasibility Study Feasibility study with a scaled model system
Execution Deploy a full-scale system
Monitoring and Optimization Post-editing and quality improvements


A TAUS Best Practice Report - Manager’s Guide to Implementing Open Source SMTManager’s Guide to Implementing Open Source SMT 
Author: Tom Hoar

Download Report
Become a TAUS member