Quality Estimation Guide
Whether you're new to Quality Estimation (QE) or looking to deepen your understanding, this comprehensive guide will answer all your questions. Can’t find the answer to your question or do you want to dive deeper into any of the questions?
Contact us
Quality Estimation: An Introduction
What is Quality Estimation?
Quality Estimation (QE) is a vital process in assessing the accuracy and reliability of content, often applied in translation workflows. It utilizes advanced algorithms to predict the quality of machine-generated content.
What are the Benefits of Quality Estimation?
Risk Management

Quality Estimation serves as a powerful tool for risk management in translation workflows. By providing accurate predictions of the quality of machine-generated content, it enables proactive identification of potential issues before they impact the final output. This proactive approach allows for timely adjustments and corrections, mitigating the risk of delivering subpar or inaccurate translations to clients or end-users.

Moreover, QE contributes to better resource allocation by identifying segments or areas of content that may require additional attention. This foresight aids in optimizing workflows, ensuring that human resources are directed where they are most needed. Ultimately, the risk management aspect of QE enhances the overall reliability and reputation of translation services.

Reducing PE Efforts

One of the key benefits of Quality Estimation is its significant impact on reducing post-editing efforts. By accurately predicting the quality of machine-generated content, QE helps identify segments that are likely to require post-editing intervention. This targeted approach minimizes the need for extensive post-editing across the entire document, saving time and resources.

Reduced post-editing efforts lead to increased efficiency in translation workflows. Translators can focus their efforts on refining specific areas flagged by QE, ensuring a more streamlined and effective post-editing process. This benefit is particularly valuable in scenarios where time is of the essence, enabling quicker delivery of high-quality translations to clients.

Benchmark MT Engines

Quality Estimation facilitates the benchmarking of Machine Translation (MT) engines, providing valuable insights into their performance. By evaluating and comparing multiple MT engines, QE enables organizations to make informed decisions about the most suitable engine for specific projects or domains.

Benchmarking with QE helps organizations identify the strengths and weaknesses of different MT engines, allowing for data-driven decisions in selecting the most reliable and accurate solution. This benefit is crucial for companies operating in diverse industries with varying language requirements, ensuring that the chosen MT engine aligns with the specific needs and expectations of each project.

What is the difference between the generic model and a custom model?

TAUS has trained a generic model for quality estimation based on the data in our Data Repository. The generic model is trained on 100+ languages and, as the name suggests, a more “generic” domain. The performance of this model differs per language and domain. This model is not set to a specific quality standard. This means that the user needs to do some exploration to find out the right threshold for their type of content and use case. Our recommendation is that anything below 0.85 can be considered as bad quality, while the ranges above 0.85 vary from acceptable to good and best.

Custom models are trained on demand. For a custom model, TAUS works closely together with the client to train the model on their unique type of content (keeping in mind certain jargon and brand names) as well as on the specific quality expectations they may have. The custom model will be able to put out a custom score, set on the specified parameters. This can be a label (“good”, “bad”) or a number.

How is Quality Estimation priced?

At TAUS, we maintain volume-based pricing for our Estimate API. Users can purchase a credit bundle that contains a number of characters (starting at 2 million). Every segment that gets sent through the API (both source and target) is counted and the number of characters are then subtracted from the credits available in their bundle. Once a bundle is depleted, users can easily purchase a new bundle.
A credit bundle can be used both on generic and custom models.

See more information on pricing on our Pricing page
How do you integrate QE into your existing workflows?

Estimate API, just like other API-based services is designed for easy integration into other applications. TAUS offers developer support and resources to assist with integration, further simplifying the process. We also have integrations with memoQ and Blackbird.

How scalable is quality estimation?

Deployed as a cloud-based service, QE can be highly scalable, allowing it to handle varying workloads and accommodate growing demands by provisioning additional resources dynamically. Additionally, we apply advanced machine learning techniques, such as distributed training and inference, to further enhance scalability by enabling efficient processing of large datasets and rapid response times.

Understanding QE Metrics
What kind of QE metrics are available?
We offer 3 kinds of metrics through the Estimate API, namely: TAUS QE Score, COMET Score and custom scores. The latter is only available for custom models and can include any type of label or scoring system.
Custom Score
How do you interpret QE scores?

The reliability of QE scores varies depending on the context and model used. QE scores are approximations, derived from mathematical representations of sentences, aiming to indicate similarity in meaning between translations. Generic models trained on vast multilingual data provide a broad understanding but may require human interpretation to correlate scores with human judgment. Customization of QE models offers flexibility, allowing tailoring to specific domains and language pairs, improving adaptability and certainty in score interpretation. Options for QE score categorization range from discrete labels like "poor", "average", "good", to continuous values, with custom models offering finer control over categorization based on labeled training data. Read more

Training QE Models
What kind of data is needed to train custom QE models?

Training custom QE models typically requires labeled data that consists of pairs of source and target sentences and their corresponding quality scores or labels. These quality scores can be human judgments indicating the perceived quality or fluency of translations. If labeled data is not available, we apply synthetic data generation techniques to augment the available data, either provided by the customer, or taken from the TAUS Data Repository.

What is the impact of a custom model?

A customized QE model tailored to a specific domain or dataset often yields more accurate predictions compared to a generic model. This is because it can leverage domain-specific features, nuances, and patterns that may not be captured effectively by a generic model. Consequently, the customized model can provide more relevant and precise insights, leading to improved decision-making and performance within its designated domain or context.

What does the customization process look like?
The customization process is as follows:

Data analysis and cleaning: if the customer has provided a training dataset, we analyze it to identify any inconsistencies, missing data, or other issues that need to be resolved before training.


Synthetic data generation: we generate synthetic data to augment the existing dataset and generate negative examples that are essential for optimal model performance.


Training: we fine-tune the QE model using the cleaned dataset.


Testing:we test the model's performance on a held-out test set to evaluate its accuracy and identify any areas for improvement.

The custom model creation process may also include an annotation step where human reviewers evaluate each segment and give it a quality score, which we then use in the testing phase to correlate the QE model scoring against human judgment.
Can I have a single custom model in multiple language combinations?
Yes, a custom model can be created in multiple language pairs. This approach is often referred to as multilingual or cross-lingual training.
Is retraining of the models needed?
The QE model can be retrained on a periodic basis to improve its accuracy for a specific use case and content type. To achieve the best results, the customer should share their feedback on the model quality, providing specific examples of poor performance.
Privacy and Security Concerns
How does TAUS handle my data?

TAUS does not store any data that is being sent through the API. Metadata such as language combinations and the quality scores, are stored so that users can gain insights into their quality levels over time, per language pair, per model, etc. through the Reports section of their TAUS account.

What kind of privacy measures does TAUS have in place?

We have a full legal framework in place
that can be found here.