Translation Productivity Revisited

Once upon a time in Land of Translations...

... we wanted to know how many words we could produce per month, per day, per hour. How much time we needed to craft human quality translation, post-editing machine translated segments. And we wanted to track the edit distance. Why on Earth?! To find ways to profile translators and post-editors, to set prices, to compare vendors, to categorize content, evaluate MT engine performance... the list is endless but are we doing it right?

Productivity tells you how fast a translation was completed. Due to the many variables, however, it will never be a reliable measurement when it comes to profiling post-editors and translators, comparing vendors or evaluating MT output. And by the way, will it ever give us valid insights into the quality and difficulty of the specific content we receive from our customers?

Productivity defined

According to Wikipedia "productivity is an average measure of the efficiency of production. It can be expressed as the ratio of output to inputs used in the production process, i.e. output per unit of input." This formula works well when all the variables on the in- and output side are listed, well defined and measured consistently. Problems arise through only taking a limited number of variables. Unfortunately, that's exactly what is happening in the translation industry today: we take time as the only input and words as the only output. As a result, the more words produced in a shorter amount of time, the higher the productivity will be. This is just too simplistic, if you ask me and I'm just wondering how our industry could get away with it so long!

There is much more to productivity than the number of words per hour. Why not also take into account the number of (final) edits per hour. Calculating one unique score that is based on the total number of words translated by the translator in an hour combined with the number of final edits done in the whole process of producing the translation (and calculated from the character-based edit distance) gives a more reliable productivity score. It's easier to translate fast when the translation memory gives many exact matches or context matches, when the MT engine is in top shape and we hardly need to translate anything from scratch as opposed to the situation where there are no available resources or ones of very poor quality. For this reason, it is a good step forward to include the number of edits per hour in the productivity score (and I will talk about this later), but one should also take into account the following variables:

difficulty of the source content (using some measurement independent of language)
quality of the source content (based on human assessment by the translator or the reviewer).
available resources (also called translation process): whether the translator did or didn't use an MT engine, a translation memory, glossary etc.
quality of these resources (using fuzzy match and MT confidence information combined with edit distance).
number of corrections applied by the reviewer(s).
number of errors, weights and penalties applied by the reviewer(s) in the review cycle(s).

Now, I don’t say this is all easy to measure, keep track of or aggregate in one single score. But still, let’s try and see what happens!

TAUS Efficiency Score

And that’s what one of the developers (Nikos Argyropoulos) thought when he came up with a new metric to measure productivity called the TAUS Efficiency Score. This score replaces traditional productivity measurement as it can be applied to every form of translation: translation from scratch, translation with translation memory, PEMT or a mix of these three. More and more translation jobs have a mixed nature: one can post-edit MT suggestions, insert TM matches or translate segments from scratch in the very same translation job. There is no hard divide anymore between MT, TM and human translation. This should be reflected in a new metric measuring productivity. In the TAUS Efficiency Score, time is measured for producing (and, if needed, updating) each segment regardless of the segment origin (MT, PE, glossary, scratch, etc).The Efficiency Score is flexible in that the number of variables used to calculate it and the ways the different measurements are taken into account vary based on user requirements and available data. The score is also relative because it is calculated based on the data present in the underlying database at the moment of calculation.

Variables

The variables involved in producing the Efficiency Score are, in the first place, the two obligatory variables (core variables) and any additional variables (optional variables) that are added to the calculation to increase precision and credibility. The score can be calculated to measure translator efficiency but the focus can also be on CAT/TMS efficiency or MT engine efficiency.

While edit distance and the edits per hour are calculated in many translation tools, this measurement tends to only be applied to evaluate MT engines and less so for evaluating post-editing productivity. Simply because no one has come up with a method that would combine a productivity score with edit distance information and normalize the score in a dynamic way. This is exactly what the TAUS Efficiency Score does when it is based on the core variables.

In order to unify the two measurements (processed words per hour and final edits per hour that is based on edit distance), one needs to convert relative scores into absolute scores. The Efficiency Score is calculated on an ongoing basis using data from the DQF database that is fed with data from real life projects. The score is displayed in the TAUS Quality Dashboard. The more data and the more homogenous data is used to calculate the score, the more precise and meaningful that score will be.

The Efficiency Score is not yet implemented in the TAUS Quality Dashboard. If you want to read more about the Quality Dashboard please click here.

Use case

The Efficiency score based on core variables is calculated using the following data:

The number of words that a translator processed. (Note: each time a translator returns to a segment, the extra time will be added on that segment.)
The edit-distance is calculated using the Wagner & Fischer algorithm after the translation process.

In the example below, four translators have been involved in similar translation projects. The table offers information on the actual number of words processed, the actual time spent, the speed expressed in word per hour, the aggregated edit distance based on all segments and this normalized to the number of edits per hour.

The normalization of all the variables will be calculated using the Min-Max normalization because it is simple, it has the advantage of preserving exactly all relationships in the data and provides an easy way to compare values that are measured using different scales.

Using the Min-Max normalization the following scores will be obtained:

Having these results, it becomes clear what is the rate of each translator in the distribution for the words per hour and edit-distance measurements, and the difference between them can be seen in a scale from [0.0, 1,0]. Both measurements have an equal share in assessing translators.

Based on the probabilities above, the Efficiency Score can be calculated. This is based on the total of the two normalized scores divided by 2.

Summary

For the Efficiency Score based on the core variables, we measure time for processing segments while tracking the segment origin. Next, we measure the edit distance and calculate the edit distance per segment (minimum number of edits needed to get from A to B) and produce the number of edits per hour. Finally, we normalize and unify the two measurements. For more precision and credibility, we can base our calculation of the score on additional (optional) features.

There are a number of reasons for developing a composite indicator for productivity based on the words per hour measurement and the edit-distance scores:

It can offer a rounded assessment of performance.
It presents the ‘big picture’ and can be easier understood than trying to find an answer in the two (or more) other measurements.
It can help for the implementation of better analytical methods and better quality data.

The two data points are used to generate a numerical score that will show the efficiency of the translator among other translators who worked in similar projects (technology, process and content). As I mentioned earlier, you can also use the score to compare technologies, processes etc. Before calculating the Efficiency Score, the data needs to be preprocessed and transformed to fall within a smaller and common range for all the metrics, such as [0.0, 1.0]. This way we give data points an equal weight.

Future work will involve adding the Efficiency Score to the TAUS Quality Dashboard. Initially this score will be calculated based on the core variables. In a later phase, the possibility of adding quality and content difficulty scores is envisioned.

Let’s see whether this will reform the way we look at translation productivity and determine our prices. In any case, one thing is for sure: the traditional way of measuring productivity is dead.