Explainable AI (XAI): NLP Edition

6 minute read

Explaining what Explainable AI (XAI) entails and diving into five major XAI techniques for Natural Language Processing (NLP).

What is Explainable AI? 

As AI is becoming more prominent in high-stakes industries like healthcare, education, construction, environment, autonomous machines, and law enforcement, we are finding an increased need to trust the decision-making process. These predictions often need to be extremely accurate, e.g. critical life or death situations in healthcare. Due to the critical and direct impact AI is having on our day-to-day lives, decision-makers need more insight and visibility into the mechanics of AI systems and the prediction process. Presently, often only technical experts such as data scientists or engineers understand the backend processes and algorithms being used, like the highly complex deep neural networks. The lack of interpretability has shown to be a means of disconnect between technical and non-technical practitioners. In an effort to make these AI systems more transparent, the field of Explainable AI (XAI) came into existence.    

Explainable AI (XAI) is an emerging subset in AI that focuses on the readability of machine learning (ML) models. These tools help you understand and interpret your predictions, reducing the complexity and allowing for non-technically trained practitioners and stakeholders to be more aware of the modeling process. At its core, XAI aims to deconstruct black box decision-making processes in AI. XAI can answer questions like “Why was this prediction made?” or “How much confidence do I have in this prediction?” or “Why did this system fail?” 


Natural Language Processing (NLP) is a subset of AI (artificial intelligence) and ML (machine learning) which aims to make sense of human language. NLP performs tasks such as topic classification, translation, sentiment analysis, and predictive spelling using text data. NLP has historically been based on interpretable models, referred to as white-box techniques. These techniques include easily interpretable models like decision trees, rule-based modeling, Markov models, logistic regression, and more. As of recent years, however, the level of interpretability has been reduced to black-box techniques, such as deep learning approaches and the use of language embedding features. With reduced interpretability comes reduced trust, especially with human-computer interactions (HCI) such as chatboxes, for example.

A Survey by IBM - XAI for NLP 

A group of researchers at IBM conducted a survey called A Survey of the State of Explainable AI for Natural Language Processing. As one of the few works on the intersection of NLP and XAI, the survey aims to provide an understanding of the current state of XAI and NLP, explain currently available techniques, and bring the research community’s attention to the presently existing gaps. The categories of explanations used consist of whether the explanation is for a single prediction (local) or the model’s prediction process as a whole (global). The main difference between these two categories is that the first explanation is outputted during the prediction process (self-explaining) whereas the second requires post-processing following the model’s prediction process (post-hoc). The authors further introduce additional explainability aspects, including techniques for serving the explanation and presentation type to the end-user. 

5 Major Explainability Techniques

The researchers in this study presented five major explainability techniques in NLP that characterize raw technical components to present a final explanation to the end-user. These are listed below: 

  1. Feature importance: uses the score of the importance of a given feature, which is ultimately used to output the final prediction. Text-based features are more intuitive for humans to interpret, enabling feature importance-based explanations.
  2. Surrogate model: a second model, or proxy model, used to explain model predictions. These proxy models can achieve both local or global explanations. One drawback of this method is that the learned surrogate models and the original model could use different methodologies when making predictions. 
  3. Example-driven: uses examples, usually from labeled data, that are semantically similar to the input example to explain the given prediction. An example of an example-driven explanation is using the nearest-neighbor approach, which has been used in areas such as text classification and question answering. 
  4. Provenance-based: utilizes illustrations of some or all of the prediction derivation process. This is often an effective explainability technique when the prediction is the outcome of a series of decision rules or reasoning steps. 
  5. Declarative induction: involves human-readable transformations, such as rules and programs to deliver explainability. 

Visualization Techniques 

XAI may be presented to users in different ways, depending on the model complexity and explainability technique used. The visualization ultimately used highly impacts the success of an XAI approach in NLP. Let’s look at the commonly used attention mechanism in NLP, which learns weights (importance scores) of a given set of features. Attention mechanisms are often visualized as raw scores or as a saliency heatmap. An example of a saliency heatmap visualization of an attention score can be seen in Figure 1. 


Figure 1 - Weights assigned at each step of translation

Salience-based visualizations focus on making more important attributes or factors more visible to the end-user. Saliency is often used in XAI to depict the importance scores for different elements in an AI system. Examples of saliency-based visualizations include highlighting important words in a text and heatmaps. 

Other visualization techniques of XAI for NLP include raw declarative representations and natural language explanations. Raw declarative representations assume that end-users are more advanced and can understand learned declarative representations, such as logic rules, trees, and programs. Natural language explanation is any human-comprehensible natural language, generated from sophisticated deep learning models. For example, these can be generated using a simple template-based approach or a more complex deep generative model. At its core, it turns rules and programs into human-readable language. 


The survey presented displays the connection between XAI and NLP, specifically how XAI can be applied to an NLP-based system. The field of XAI is meant to add explainability as a much-desired feature to ML models, adding to the model’s overall prediction quality and interpretability. Explainability can be categorized into different sectors of the NLP model, as well as being depicted by different visualization techniques seen above. Because of the large-scale presence of NLP around us, including chat boxes, predictive typing, auto-correct, and machine translation, it is important for any end-user, especially in NLP-based organizations, to understand the behind-the-scenes grunt work of the model. XAI allows for the end-user to gain trust in the NLP application being used and therefore allowing for a positive feedback loop, to ultimately make the algorithm even better. As XAI is still a growing field, there is plenty of room for innovation on improving the explainability of NLP systems. 

TAUS provides professional language data services to some of the world’s largest technology companies. Our data collection, annotation, processing and NLP  capabilities and global data contributors at scale remain a source of competitive advantage for leaders in artificial intelligence (AI) and machine learning (ML) space. 


Husna is a data scientist and has studied Mathematical Sciences at University of California, Santa Barbara. She also holds her master’s degree in Engineering, Data Science from University of California Riverside. She has experience in machine learning, data analytics, statistics, and big data. She enjoys technical writing when she is not working and is currently responsible for the data science-related content at TAUS.

Related Articles
Purchase TAUS's exclusive data collection, featuring close to 7.4 billion words, covering 483 language pairs, now available at discounts exceeding 95% of the original value.
Explore the crucial role of language data in training and fine-tuning LLMs and GenAI, ensuring high-quality, context-aware translations, fostering the symbiosis of human and machine in the localization sector.
Domain Adaptation can be classified into three types - supervised, semi-supervised, and unsupervised - and three methods - model-centric, data-centric, or hybrid.