Produced in partnership with CNGL

First published: November 2010

MT Post-editing Guidelines


ON SLIDE SHAREON ScribdDOWNLOAD IN PDF
 BAHASA INDONESIA | DANISHGERMAN | ESPAÑOL | FRANÇAIS | ITALIANO | 日本語 | 한국말 | MALAY | NORWEGIANPORTUGUÊS DO BRASIL | SWEDISHภาษา ไทย | Tiếng Việt

TAUS GuidelinesObjectives and scope

These guidelines are aimed at helping customers and service providers set clear expectations and can be used as a basis on which to instruct post-editors.

Each company’s postediting guidelines are likely to vary depending on a range of parameters. It is not practical to present a set of guidelines that will cover all scenarios. We expect that organisations will use these baseline guidelines and will tailor them as required for their own purposes. Generally, these guidelines assume bi-lingual postediting (not monolingual) that is ideally carried out by a paid translator but that might in some scenarios be carried out by bilingual domain experts or volunteers. The guidelines are not system or language-specific.

 

Recommendations

To reduce the level of postediting required (regardless of language pair, direction, system type or domain), we recommend the following:

  • Tune your system appropriately, i.e. ensure high level dictionary and linguistic coding for RBMT systems, or training with clean, high-quality, domain-specific data for data-driven or hybrid systems.
  • Ensure the source text is written well (i.e. correct spelling, punctuation, unambiguous) and, if possible, tuned for translation by MT (i.e. by using specific authoring rules that suit the MT system in question).
  • Integrate terminology management across source text authoring, MT and TM systems.
  • Train post-editors in advance.
  • Examine the raw MT output quality before negotiating throughput and price and set reasonable expectations.
  • Agree a definition for the final quality of the information to be post-edited, based on user type and levels of acceptance.
  • Pay post-editors to give structured feedback on common MT errors (and, if necessary, guide them in how to do this) so the system can be improved over time.

Postediting Guidelines

Assuming the recommendations above are implemented, we suggest some basic guidelines for postediting. The effort involved in postediting will be determined by two main criteria:

  1. The quality of the MT raw output.
  2. The expected end quality of the content.

To reach quality similar to “high-quality human translation and revision” (a.k.a. “publishable quality”), full postediting is usually recommended. For quality of a lower standard, often referred to as “good enough” or “fit for purpose”, light postediting is usually recommended. However, light postediting of really poor MT output may not bring the output up to publishable quality standards. On the other hand, if the raw MT output is of good quality, then perhaps all that is needed is a light, not a full, post-edit to achieve publishable quality. So, instead of differentiating between guidelines for light and full-postediting, we will differentiate here between two levels of expected quality. Other levels could be defined, but we will stick to two here to keep things simple. The set of guidelines proposed below are conceptualised as a group of guidelines where individual guidelines can be selected, depending on the needs of the customer and the raw MT quality.

Guidelines for achieving “good enough” quality

“Good enough” is defined as comprehensible (i.e. you can understand the main content of the message), accurate (i.e. it communicates the same meaning as the source text), but as not being stylistically compelling. The text may sound like it was generated by a computer, syntax might be somewhat unusual, grammar may not be perfect but the message is accurate.

  • Aim for semantically correct translation.
  • Ensure that no information has been accidentally added or omitted.
  • Edit any offensive, inappropriate or culturally unacceptable content.
  • Use as much of the raw MT output as possible.
  • Basic rules regarding spelling apply.
  • No need to implement corrections that are of a stylistic nature only.
  • No need to restructure sentences solely to improve the natural flow of the text.

Guidelines for achieving quality similar or equal to human translation:

This level of quality is generally defined as being comprehensible (i.e. an end user perfectly understands the content of the message), accurate (i.e. it communicates the same meaning as the source text), stylistically fine, though the style may not be as good as that achieved by a native-speaker human translator. Syntax is normal, grammar and punctuation are correct.

  • Aim for grammatically, syntactically and semantically correct translation.
  • Ensure that key terminology is correctly translated and that untranslated terms belong to the client’s list of “Do Not Translate” terms”.
  • Ensure that no information has been accidentally added or omitted.
  • Edit any offensive, inappropriate or culturally unacceptable content.
  • Use as much of the raw MT output as possible.
  • Basic rules regarding spelling, punctuation and hyphenation apply.
  • Ensure that formatting is correct.

Thanks to everyone who has helped to put these guidelines together. We were very fortunate to have the help of TAUS Members, governmental institutions and translator organizations. Details about the project team and process for arriving at these guidelines can be found here.

Special thanks to Sharon O'Brien, Dublin City University and CNGL, and Fred Hollowood, Symantec and TAUS Advisory Board for their dedication and support in putting these guidelines together.

Best Practices Search