Phillip Koehn

Statistical machine translation (SMT) researchers in Europe will be having a field day in May, with the 2nd Machine Translation Marathon in Berlin lasting ten whole days (May 10 to 20). The event is being organized by EuroMatrix, a public-funded research network project devoted to multiplying SMT development and evaluation for various pairs of European languages. At the end of the event, an invitation-only 2-day Translingual Europe conference is due to inform industry and commerce among other audiences about the "opportunities and challenges" for EU research in MT.

Masterminding the Marathon is Philipp Koehn, a machine translation researcher at the University of Edinburgh and also Chief Scientist with Asia Online. (See story on Asia Online). TAUS recently talked to Dr. Koehn about his work.

What are the main technical challenges facing SMT research today?
Statistical MT started off in the 1990s with word tables and then shifted up to phrase tables. Today we are looking at how to integrate linguistic knowledge into the data units. SMT has so far been completely language independent - the properties of individual language were not factored into the algorithms. The idea now is for SMT to make use of the data in the linguist's tree structure underlying a sentence. Another angle is to look at integrating richer annotation into the representations in an approach we call factored translation models. So the trend is towards making linguistically sophisticated models. This has been motivated by the problems faced when translating between languages of different syntactic structure and from languages with a low degree of morphology - such as English - into ones that have morphologically complex languages such as German or Czech. On the whole, French and Spanish to/from English are pretty good using current methods.

Why do you think there is so much buzz about SMT today?
As a research field, SMT is a very open community, with lots of sharing of ideas and publishing papers. One key reason for this is the annual evaluation campaigns, of which the US organized NIST is the best known, pitching MT systems producing English translation of Arabic and Chinese texts. If your system does well, it shows in the competitions, so people pay close attention not simply to ideas and theories, but to what actually works. The result is that people borrow good ideas and rapidly apply them to solving problems. So the field advances into a new cycle of research more quickly than in the past. In the USA, the focus is on translating into understandable English for the defense community, driven by DARPA funding. In the EU, there is a much greater need to publish in lots of different languages. This is why I have been involved in EuroMatrix, a project to build a continent-wide infrastructure to support research and make SMT systems more available more quickly. In this project, I have developed MT systems for 110 different language pairs.

Are there any useful advances in SMT quality evaluation?
In the R&D community, the idea is to continually test a new system by making a small change and testing it, and then another and testing it, and so on. The Blue test was developed to simplify and speed up this process of constant re-evaluation of a given system or language pair. However, it still depends on paying people to prepare test sets with human translations that then form the basis of the score. You cannot use Blue or similar automated tests to compare two different types of systems. In the end, users will have to rely on human judgments, probably in a task-driven context, by having people say whether or not a given translation allowed them to complete a task. But this takes a lot of time and effort. There is no easily automated QA system for MT output.

Something to look forward to?
Firstly, Asia Online, which interests me because it fits into my research agenda of developing SMT systems for lots of different language pairs. Asia Online is focusing on Asian languages, of course. The main quality issue is managing people's expectations for the application in question. People are clearly using MT a lot around the world, so it is obviously serving some purpose. We hope to expand the range of available languages for this. Secondly, since we will not see fully-automated high quality translation in the near future, the main task is to help human translators do their job better and more quickly. We need to go beyond translation memories, but there are still a lot of possibilities for technology to assist in the high quality output area. Statistical MT is powerful when good language data become widely available and the world starts embracing the technology.