Nowadays, Machine Translation (MT) is increasingly being considered as another asset in the translation industry along with the translation memories (TM) and the terminologies. Language Services Providers (LSP) are gradually pondering the benefits of MT in their projects and either creating a specific department for MT matters, partnering with major MT engine providers, or trusting their client’s MT results. Thus, the usage of MT is a global fact, but its application in particular contexts still has to be explored.

With the purpose of gathering data on real-life practices, a UAB PhD researcher María Do Campo Bayón launched a survey addressed to Language Services Providers to understand how much data is needed to create and train an MT engine. The survey was open to descriptive answers about the LSPs’ use and/or development of MT. The call for participation was made with the collaboration of TAUS.

These are some of the insights:

Motivation to invest in MT

LSPs reported that their decision to make such an investment is usually justified or driven by the following criteria:

Client’s demands (31%)
Time (15%)
Translation fee (8%)
Amount of legacy data (7%)
Reports of QA Department (8%)
Combination of all factors (31%)

Training Data

Once the motivation is established, it is important to have a clear idea of what you want to achieve with the MT engine. Based on that, you need to determine the minimum amount of data needed to train the MT engine. 55.6% of respondents answered that they have determined the minimal amount of data needed for the training of one or more language pairs whereas the remaining 44.4% has not yet established such a figure.

Among those who have already established minimal figures, the indicated volumes vary, although they all mention relatively big figures. The range goes from 10 - 15 million to at least around 80,000 segment pairs. Different language pairs and content types will demand more or fewer data, but in general, we can establish a minimum span of around 500,000 - 1,000,000 segment pairs.

Client’s TM (50%)
Corpus from specific content type (17%)
Client-based terminology (8%)
General corpus (8%)
All of the above (17%)

Testing phase

After the training, you need to set up a testing phase that matches your project’s pipeline and workflow. The answers revealed that, for testing, LSPs follow a mix of automatic and human evaluations. Most participants combine automatic metrics with tests involving linguists (post-editing tasks, manual scoring tasks, etc.). A third of the companies choose to do only human evaluations such as editing by a native translator, human revision, or outsourcing to machine learning specialists. LSPs have also reported that they use only automatic metrics such as scripts for BLEU and TER and comparison reports. There are also companies that define a specific process based on their pipeline and type of projects and clients. In one specific example the LSP first contrasts automatic metrics with the human translation. They then use a test set, extract 1000 words from a representative text and run automatic and human evaluations to compare specific vs. generic engines. They repeat

these tests until acceptable metrics and scores are achieved.

When asked about the kind of indicators used, companies use all available indicators: human evaluation, automatic evaluation, edit distance, post-editing effort, and productivity tests (only carried out in long-term projects).

Regarding the kind of comparisons they do, these are the results:

Specific vs. generic MT (same type of engine) (28%)
Raw MT vs. Human Translation (22%)
Specific vs. specific MT engine (different engine provider) (22%)
Generic vs. generic MT engine (different engine provider) (11%)
Neural vs. PB MT engine (11%)
None (6%)

Minimal threshold

It is difficult to establish clear rules or guidelines when approving the use of an MT engine. That is the reason why the survey also asked for minimal thresholds in two scenarios - low and high impact/visibility projects.

First, we asked respondents for a minimum threshold in low impact/visibility technical documentation projects. These are the common answers:

Good automatic scores
Higher productivity and lower price of MTPE vs. human translation
< 35% of edit distance and depending on quality expectations offer full PE, light PE, or raw MT
No threshold at all as it depends on the budget and the deadline

Then, we asked the same question for a high impact/visibility project. The answers are different:

Passed QA check
Higher productivity and lower price of MTPE vs. human translation
< 25% of edit distance, offering full PE and, sometimes, some level of review as well, mainly focusing on style and fluency.
30 HTER (Human-targeted Translation Error Rate)
No threshold at all as it depends on the budget and the deadline

Adoption or rejection phase

All answers have one thing in common and that is that they all mention qu

ality. As quality is a well-known controversial term in translation theory, it is important to determine what is considered as MT output quality. Most respondents (50%) refer to engine output quality in terms of productivity gain, for example, if post-editing is faster than human translation, if the edit-distance is lower or the project is delivered earlier Others (17%) consider an engine good enough to use if the style is acceptable. Finally, a great group of participants (33%) relies on the consistency over the trials, o

n the acceptable metrics and A/F scores, and on the “green light” of human evaluators.

Improvement phase

Even if LSPs are already working with a tested and evaluated MT engine, they may consider making improvements to the engine in the future. When it comes to improving the engines, all survey participants agree on two main reasons for adjustments: post-editors’ performance (speed and quality) and client's needs. So, it is advisable to ask for feedback from both post-editors and clients and continue reporting on MT productivity and quality. Nevertheless, 11.1% of companies reveal that the cost is also a factor to consider.

Final recommendations

Before the training phase:

Determine your business case
Establish your goals
Decide a type of engine
Gather a minimal amount of data from all your available resources

After the training phase:

Design a quality procedure combining automatic and human evaluations
Establish your minimal threshold depending on the type of project
Repeat the training & testing cycle until expectations are achieved

When the engine is in a production environment:

Keep performing quality checks
Ask for post-editors’ and clients’ feedback
Correct errors as needed

ABOUT THE AUTHORS

María do Campo Bayón has studied Translation Studies at the University of Vigo and completed her master's degree in Tradumatics: Translation Technologies at the Autonomous University of Barcelona. Previously, she worked at various positions as a translator, post-editor, project manager and localization engineer. Currently, she is combining her professional career as Deputy Technical Manager at CPSL Language Solutions with a PHD in machine translation and low-resource languages.

Pilar Sanchez-Gijon is a senior lecturer in translation technologies of the Tradumàtica Research Group at the Department of Translation and Interpreting at the Universitat Autonoma de Barcelona, where she teaches subjects related to CAT tools, corpus linguistics, machine translation and terminology. Since 1999, she has been a member of the Tradumatica research group, which focuses on translation and technologies. Her research focuses on translation technologies, localization, machine translation, and post-editing.