First published: October 2013

Pricing Machine Translation Post-Editing Guidelines

DOWNLOAD IN PDF

Aims

These guidelines aim to help you understand how to arrive at a suitable pricing model for machine translation post-editing (MTPE). Unfortunately, there is currently no single method to determine MTPE pricing. Instead a combination of approaches is needed to calculate post-editing effort and set pricing accordingly. Pricing may be set on a per-hour or per-unit basis.

 

Caveat Emptor

Be warned this document assumes a level of knowledge that is not all that common in the industry and in general is complex. If you are new to the area these guidelines will serve as a resource and access point to help develop your own acid test for determining pricing. Note also this is an evolving space and best practices may be different in a year from now. Our aim as always in such situations is to determine current collective wisdom and make improvements overtime.

Guiding Principles

Whatever combination of approaches you decide to use, your model should be:

  • Predictive. A model for pricing MT Post-editing helps to predict the cost
    • Your model should help establish pricing up-front. Therefore, a model should either allow for extrapolation or be able to calculate the cost of a particular volume of text instantly. Remember, pricing may change each time you evaluate and deploy a new version of an engine.
  • Fair. A model for pricing MT Post-editing provides buyers, language service providers and translators with a reliable measurement of the required work.
    • All parties involved in the translation process, for example translators, language service providers and buyers should be involved in establishing your approach.
    • All parties should agree that the pricing model reflects the effort involved.
    • Translators should be provided with training on post-editing and realistic expectations must be set. See TAUS Machine Translation Post-editing Guidelines for more detailed information on quality expectations.
    • It can be difficult to demonstrate you are being always fair, because circumstances will serve to undermine the assumptions in certain cases. We ask that you share those experiences with us (This email address is being protected from spambots. You need JavaScript enabled to view it.) so that we can create a public knowledgebase over time.
  • Appropriate. A model for pricing MT Post-editing considers content characteristics.
    • Content type. MT output quality might greatly vary depending on content type. Similarly, different content types might require differing post-editing levels.
    • The language pair involved in the translation process will affect the quality of MT output.
    • Hence pricing may differ by language pair and content type.
    • When you undertake evaluations to help establish pricing make sure:
      • you test the model on representative test-data, i.e. the quality of the test-data has the same characteristics of that used in the real setting;
      • you use a representative volume of test-data to allow for a comprehensive study; and
      • the content-type in the test-data matches that of the real setting.

MT quality should be established in one comprehensive evaluation early in the MT adoption/implementation process for EACH engine and content type, and not continuously, unless you are able to establish a cost effective operating model to capture the right data points on an ongoing basis to set pricing

In a post-editing scenario, spot checks to monitor quality is then advised, and feedback from post-editors should be collected – keeping the dialogue open, acknowledging and acting on feedback where possible.

Your method for assessing quality and establishing pricing should be transparent.

Approaches

This section introduces the approaches that should be used. Links later in the document guide you to more detailed information.

You will need to combine a number of approaches to achieve a predictive, fair and appropriate model. This may involve combining automated and human evaluation, and undertaking a productivity evaluation. Productivity assessment should always be used.

A combination of these three approaches is recommended:

  • Automated quality score (GTM, TER, BLEU, MT Reversed Analysis)
  • Human quality review
  • Productivity assessment (post-editing speed)

Automated Metrics

We outline two automated metrics only. There are many others.

GTM and MT Reversed Analysis and can be used in combination with productivity assessments and human review to help set pricing. Both require human reference translations.

GTM

GTM (General Text Matching) measures the similarities between the MT output and the human reference translations by calculating editing distance.

MT Reversed analysis

This approach aims to correlate MT output quality with fuzzy-match bands. It calculates the fuzzy-match level of raw MT segments with respect to their post-edited segments. The approach relies on a well-established pricing model for TM-aided translation. The process runs as follows:

  • Post-edit the raw MT output.  Apply a fuzzy-match model to the raw MT and post-edited pairs as it is done in TMs. Assuming that a particular engine will behave similarly in resembling scenarios (content type, language pair, external resources), establish expected fuzzy-match band proportions and rates for each band.
  • To calculate cost savings, you can compare: (1) the hypothetical price for the source and the final translation (post-edited version of the source) obtained through a fuzzy-match pricing model, and (2) the cost of post-editing the raw MT output through a productivity assessment to test the results and refine assumptions.

Human Quality Review

Human review can be used to assess the quality of MT output, assess the validity of mapping MT output quality to translation memory match rates and also assess whether the final quality of post-edited content is up to the desired level.

Post-editing Productivity Assessment

This approach calculates the difference in speed between translating from scratch and post-editing MT output. The results may vary each time you create a new engine. Therefore in order to be predictive, fair and appropriate, you would need to rerun productivity evaluations each time you create a new ‘production’ ready engine. Depending on how you combine methods to establish pricing, you may undertake small-scale short-term productivity tests or larger-scale longer-term assessments. A link is provided to Best Practice Guidelines for Productivity Evaluations later in this document.

Examples of how to Combine Approaches

  1. Determine a threshold for automated scores above which a minimum acceptable level of/improvement in quality for post-editing has been achieved. Or undertake human review to determine that a minimum level of/improvement in quality has been achieved. Undertake productivity assessment to determine the (added) speed from post-editing and determine a pricing model. You will need to be familiar with the nuances of automated metrics. You will need to undertake productivity assessment over a period of weeks to establish a predictive, fair and appropriate pricing model.
  2. Determine a threshold for automated scores, above which a minimum acceptable level of/improvement in quality for post-editing has been achieved. Or undertake human review to determine that a minimum acceptable level of /improvement in quality has been achieved. Post-edit a sample of representative content. Undertake reversed analysis of the post-edited content to map to fuzzy match price band rates. Undertake a small-scale productivity assessment and human review to validate and refine the conclusions. The errors produced by MT are different from that found in fuzzy matches, hence productivity tests and human review are necessary. Combining these approaches each time you have a new engine should ensure your pricing model is predictive, fair and appropriate.

How Best Practices Could Evolve
Area of Research

Source and translation quality estimation

Translation quality estimation is a hot research topic that aims to predict translation quality based on a number of source and target text features. It is yet far from being a well-established technique ready for implementation.

Yet, in the future, it could provide us with a diagnostic model to predict post-editing costs. The model would automatically learn from:

  • Source text features
  • Translated/post-edited text features
  • Post-editing process information (speed, changes)
  • Human evaluation of the post-edited text

See video on how the DQF Tools aim to support academic research in this area.

Our thanks to:

Nora Aranberri (TAUS Labs) and Katrin Drescher (Symantec) for drafting these guidelines.

The following organizations for reviewing and refining the Guidelines at the TAUS Quality Evaluation Summit 15 March 2013, Dublin:

ABBYY Language Services, Amesto, Capita Translation and Interpreting, Concorde, Crestec, EMC, Google, Intel, Jensen Localization, Lingo24, McAfee, Microsoft, Moravia, Pactera, R.R. Donnelley, Sajan, STP Nordic, Vistatec, Welocalize and Yamagata Europe.

Consultation and Publication

A public consultation was undertaken between 23 June and 29 August 2013. The guidelines were published on 8 October 2013.

Feedback

To give feedback on improving the guidelines, please write to This email address is being protected from spambots. You need JavaScript enabled to view it.

Best Practices Search