Neural Machine Translation: the New Kid on the Block

Neural Machine Translation Nowadays, in one way or another, machine translation (MT) is part of our everyday lives. Most likely Google made that happen, about a decade ago, by launching Google Translate, a free instant online general-purpose translator allowing users to translate any text (words, phrases, documents, web pages) in different language directions.

Although the translations of general-purpose MT systems (as opposed to more specialized, domain-specific ones) may not always be of good quality (especially when morphologically rich languages or ideographic ones are involved), users still have the opportunity to get (for free) the gist of a text or a web page written in a language they don’t understand or have texts translated that would never be translated otherwise. And I think that this is already great. But currently there is so much happening in the machine translation field!

MT Technology Evolution

Over the years, technologies in the field of MT have evolved from rule-based MT (RBMT) to statistical MT (SMT) and its most used variant phrase-based MT (PBMT), and recently to the newcomer neural MT (NMT). These technologies can be used alone or in combination (e.g. the combination of rules and statistics forms a hybrid MT system).

Each technology is based on different principles and methods. RBMT gives instructions with specific sets of rules for the system to translate a text from source to target language. Statistical MT approaches are applications of machine learning (ML); SMT trains the machine to learn from the data so it can find out how to perform translations rather than providing sets of rules and instructions. NMT is a deep (machine) learning (DL) process based on neural networks. Basically, neural networks learn from patterns. The NMT system is fed with the training data and set parameters, but the iterative process itself is a black box: the process has to be stopped in order to view the results, and then continued, if needed.

There are advantages and disadvantages for each technology or technology combination. Performance often depends on the languages involved, the translation direction, the domain, and the availability of language resources. For example, RBMT requires a lot of linguistic (lexical and grammar) resources for the languages involved, and manual work (e.g. writing rules); it does not need bilingual texts (which is convenient for under-resourced languages) and it produces linguistically correct translations due to the rules, although not always fluent.

SMT requires large amounts of language resources, both monolingual and bilingual (which can be a drawback for under-resourced languages) and processing resources; it is usually not specifically tailored to specific language pairs and produces more fluent translations due to the language model. It splits the input sentence into smaller units (words and phrases) to be translated separately while NMT regards the input sentence as a single unit to be translated (so it can translate more in-context). NMT is computationally expensive but it can improve translation performance with less data.

More on Deep Learning

As mentioned before, NMT uses deep learning methods inspired by the human brain’s ability to learn. DL methods are based on multi-layered neural networks, helping computers to make sense of infinite amounts of data in the form of images, sound and text. The more layers there are, the more machines learn to deal with more complex concepts and situations and the more processing power is needed to train such machines to learn. Neurons do not really understand anything, as for example what “this is a cake” means or looks like; a neuron only transmits a signal which is received by another neuron which , in turn, transmits another signal and so on. DL uses this process and information to build a hierarchical representation to get to the final result, be it a translation or an image (in our example).

Experts say that DL will impact, and be applied to, a lot more applications in the coming years. Businesses are already delivering new products and services based on this new way of thinking about data and technology. Examples include the automotive industry with self-driving cars, the robotics sector with autonomous learning robots, the medical industry with medical diagnoses, as well as image and speech recognition and Natural Language Processing (NLP) with NMT.

The Bottom Line

In DL and, therefore, in NMT, programs are not written to solve a problem but to make the machine learn how to solve the problem from examples. It is the specific process of learning that allows computers to improve over time and to make them smarter, rather than writing programs with a specific set of instructions.

A highlight of DL is that it turns processes that used to require a large amount of domain knowledge to be solved, to engineering problem solving processes. So computers are fed less and less with instructions (e.g. rules and linguistic information) and learn more and more to recognize patterns in huge data repositories and to keep learning and absorbing knowledge to apply it further, in a similar way to humans.

NMT at TAUS events

At the TAUS Translation Technology Showcase in Dublin (June 2016), moderated by Anne-Maj van der Meer (TAUS), in their presentation Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit) compared statistical MT to NMT and gave insights into the way NMT works and how to implement this new technology in real translation and localization processes. Bartolome and Ramirez explained that now is the right moment for NMT because the computational power, deep learning algorithms and training data are available.

During the TAUS session at LocWorld in Dublin (June 2016) New Frontiers in MT and the Value of Big Data, moderated by Jaap Van der Meer (TAUS), professor Josef van Genabith’s (DFKI) talked about statistical MT and NMT. He argued that SMT is the tested method that works and yields good results in many use cases. However, it does not usually perform well with morphologically rich languages or languages with complexity in word order (like Russian or German): no matter how large the training data is, it is still hard to cover all the examples of the languages. NMT seems able, among others, to bridge the quality gap between morphologically rich and syntactically complex languages.

What’s next?

There is no question that we are experiencing exciting times in machine translation. Language research technology groups are currently experimenting with NMT. Global players like Google, Microsoft, Facebook, Baidu and SYSTRAN are working on and launching NMT solutions. In August, SYSTRAN announced the launch of its Pure Neural Machine Translation (PNMT) engine, claiming to provide higher translation quality to the current state-of-the-art and, in some ways, to human translation. A few days ago, Google announced the Google Neural Machine Translation system (GNMT) which is now in production for Chinese to English translations. Google claims that the GNMT system outperforms Google Translate, while in some cases, NMT translations almost match human translations.

In light of the current scientific and technical developments and progress, it seems that NMT is here to stay, whether as a pure solution or combined with other MT technologies. It is a promising technology for significantly boosting machine translation performance and with the potential to overcome many shortcomings of other MT technologies, and compete with human translations.

Join us in Portland, OR

TAUS is actively following advances in NMT technology and fosters discussions with experts in the field. Join us in Portland for the TAUS Annual Conference 2016 in October and at the sessions What’s next in MT: the N-factor and Zen and the Art of Robot Maintenance. Or stay in touch and look for the Keynotes eBook on all that happened during this event, which will be published by the end of the year!