Measuring Content Quality with Error Typology: Step by Step Guide

Error Typology is a venerable evaluation method for content quality that’s very common in the modern Translation & Localization industry. Despite having been popularized for translated multilingual content, it can easily be applied in a single-language context just as well, with only minor changes. Here’s how to use it.

1. Preparation

Take the initial version of your content (in one language, or in a pair of languages if you’re analyzing a translation)
Take an updated (e.g. revised or final) version of your content in the same language (or language pair)
Do a sentence-by-sentence comparison of the initial version and the updated version
1. Any modern CMS or TMS will typically store multiple revisions for each of your content assets automatically. This means you already have a wealth of information for potential analysis at your fingertips! The challenge is mostly about picking the right revisions since analyzing each revision versus each other is VERY time-consuming.
2. If you don’t have another revision yet, consider revising the content yourself – or ask your peers, subject matter experts, editors, or in-country reviewers to do that for you (whatever works best for your content production process). Note down which errors to correct and which improvements to make, in order to have this piece of content better match your requirements.
3. If you store these lists of corrections separately from the changed content itself, you will have a “virtual” revision (a quality evaluation) that can be implemented into content at a later stage (e.g. by your writers or translators in your CMS or TMS). This doesn’t really matter for Error Typology analysis since in either case you would still have 2 revisions of content to compare.

2. Classification

For each individual change that was made to the content between revisions, assign a category (type) to it
1. Spelling & grammar, style, and terminology are some of the most commonly used categories. Frameworks like MQM and TAUS DQF offer extensive multi-level lists of categories that you can pick & choose from.
2. You can start from having 2 categories, and go down all the way to 20 or more! We suggest tailoring the exact categories and subcategories you use to your content types, audience profiles, and content production processes.
3. If there were multiple changes done in 1 sentence, remember to classify each change separately. Otherwise, you risk skewing your analysis.
Optionally, also assign a severity (importance) to each change, to reflect that not all changes or issues will have equal impact on the reader
1. Severities might range from preferential (often those are not errors or issues per se, but subtle suggestions or improvements) all the way up to critical (those jeopardize the ability of this piece of content to fulfill its intended purpose).
2. Using 4 different severity levels is a best practice, with the preferential severity usually not impacting the overall metrics (it’s not being counted).
3. An optional positive severity (also known as “kudos” in DQF) can be introduced to reward content professionals for a particularly slick or impressive choice of words. In other words, “kudos” are a form of praise that strongly suggests keeping a particular piece of text intact (as opposed to making changes).
Repeat the above steps for each revision, each piece of content, and each language (or language pair) being analyzed

Classification is most frequently performed by human experts, but can also be produced by automatic tools. Those tools “read” your content and find various types of content quality issues using algorithms (including Natural Language Processing, Artificial Intelligence, and Machine Learning). Since automatic tools sometimes produce false positives (issues which are not really issues), it’s usually advisable to remove those first if you strive for accurate Error Typology analysis. However, even the raw output is sometimes enough to quickly gauge certain aspects of quality and guide further decisions.

Classification can either be performed at the same time as the actual revision or done separately at a later stage (potentially by another party). Essentially, any document revision can be turned into a quality evaluation at any time! This can be very useful for post-project analysis since it allows your global content teams to focus on producing top-notch content first and analyze their work later.

3. Scoring

Note: we describe just one possible way of how to perform content quality scoring which is based on MQM recommendations. Many alternative ways exist!

Assign relative weights to each category
1. The weights you pick depend on how serious you believe each specific type of issues to be for your content.
2. Actual numeric values are less important as long as they make sense to your team.
3. Examples:
  1. Medical device user instructions must precisely describe how the device is actually operated, so accuracy is key and should likely have a higher weight than any other category you use.
  2. Marketing brochures often benefit from creative, well-written copy that drives engagement, so style is key and should likely have a higher weight than any other category you use.
If you’re using severities, also assign relative weights to each severity
1. For medical content, incorrectly instructing the user to push a button which turns off monitoring for patient’s life support might be a recall-class error. However, suggesting to adjust a dial for brightness at a wrong point in time might still be OK.
2. For marketing content, spelling the brand name incorrectly might be a recipe for disaster. At the same time, a slightly overused cliché might only be a minor detriment to style.
3. As an example, MQM recommends using weights of 1 for minor issues, 10 for major issues, 100 for critical issues. This also implies using 0 (zero) for any preferential issues.
4. If you use “kudos”, you can adopt a negative weight (e.g. -1) for them. Since we will be doing subtraction in the next calculation, the negative sign will act exactly as we need it to.
Do the maths (e.g. in an Excel spreadsheet, or through dedicated QA features of your favorite tool)
1. For each category, count the number of issues within each severity level (plus kudos, if you’re using those):
  1. Issue Count Minor, Issue Count Major, Issue Count Critical, Kudos Count
2. Multiply the amount of issues by respective severity weight and add these up:
  1. Penalty by Category = Minor Issue Count * Minor Weight + Major Issue Count * Major Weight + Critical Issue Count * Critical Weight – Kudos Count * Kudos Weight
3. Add up the penalties for each category:
  1. Penalty Total = SUM(Penalty by Category)
4. Calculate the number of words in your original content revision
  1. Original Word Count
5. Divide the total penalty by the word count and represent it as a percentage:
  1. Penalty Total per Word % = Penalty Total/Original Word Count
6. Subtract this number from 100%
  1. TQ = 100% – Penalty Total per Word %

Now you have a simple single-number representation of how different aspects of quality have played out in your content according to your requirements. In other words, a content quality score. This score can be easily stored over time in large quantities (e.g. in an Excel spreadsheet, in a database, or even in a dedicated content quality management system that directly connects all types of quality evaluations to specific content items). It also lends itself extremely well to all sorts of quantitative analysis techniques. We’ll talk about those in a later post.

4. Limitations

Error Typology analysis is rather time-consuming and requires well-trained and well-instructed content professionals (writers and translators) to consistently do it right. That’s why in practice, companies usually apply it to subsets (or samples) of data in statistically sound ways that allow drawing conclusions about a larger whole (e.g. a set of documents) by one of its parts (e.g. a chapter). However, the level of detail you can get from this analysis and the resulting learning potential for your global content teams are unparalleled.

While Error Typology is very useful for detailed internal analysis, it is an atomistic, expert-based quality evaluation method. Thus, it doesn’t accurately predict the holistic perception of content by the reader, and might not be a good leading metric for content performance in many cases. For a true 360-degree view of quality, Error Typology should be paired with holistic quality evaluation methods and content performance metrics.

This blog post originally appeared on ContentQuo.

ContentQuo is an early-stage tech startup - founded by TAUS representative Kirill Soloviev - building software solutions that help translation&localization departments and language service providers deliver better multilingual content at less cost through quality management methods and tools.