Explainable AI – what is it and why should you care?

In most of the cases each model is observed as a black box – data goes in, prediction goes out. The intrinsic details of processing are initially reserved for machine learning researchers in academic circles. Even they don’t know what exactly did the network learn, they can only propose the model architectures and training methods which would probably result in a successful learning outcome.

If you ask a machine learning practitioner to explain you why did the model make exactly this prediction for a given input, they would respond with a question “Why do you care? It works”.

Even though they are partially right, there is a huge importance in being able to argument the reasoning behind your AI model processing. This resulted in creation of a new field of AI called “Explainable AI”.

Explainable AI is a promise of discovering the exact set of human-understandable reasons behind each AI model output.

Reasons for making a decision will soon become equally important as the decision itself. If you are still not persuaded, check out Google’s dedicated cloud service for model explainability.

Why should you care?

In order to paint the picture of AI explainability landscape I will try to answer three basic questions:

  1. What are the advantages of explainable AI?
  2. How is explainable AI currently being used?
  3. How will explainable AI be used in the future?

I will focus on the less popular and more though-provoking reasons other online resources might leave out.

What are the advantages of explainable AI?

  • It helps identify vulnerability to adversarial examples before deploying models to production. Adversarial examples are very slightly altered inputs to the model which result in vastly different outputs. As an example, a very accurate classification model could identify a picture of a dog correctly. But if an attacker only changes the intensity of carefully chosen set of pixels very slightly the network will classify the altered picture as a car, even though visually there would be no apparent difference and a human would have no problem whatsoever in classifying it as a dog. The paper “Adversarial attacks against a medical deep learning system” researchers have successfully altered the behaviour of three different healthcare ML classifiers (glass-box and black-box). “Our experiments indicate that adversarial attacks are likely to be feasible even for extremely accurate medical classifiers, regardless of whether prospective attackers have direct access to the model or require their attacks to be human imperceptible”, they concluded. These adversarial attacks pose a great security threats when deployed to production. Imagine the implication of performing similar attacks on a self-driving car, tricking it into thinking that a crossroad or a pedestrian are not there only by sticking a carefully chosen visual pattern on a traffic sign. In these cases, an explainable model would show a very wrong, non-human set of rules based on which it determines its output, so ML practitioners would be forced to iterate more until a model with human-like, logical set of reasons is trained. This model is less likely to have vulnerability to adversarial attacks.
This colorful printed patch makes you pretty much invisible to AI - The  Verge

A human body detector being rendered unusable if the subject prints out a carefully design rectangular pattern and sticks it to the shirt.


Can we trust AI?. Until recently, the major driving force… | by Ömer Faruk  Tuna | Medium

Adversarial attack on a cute cat. Cats don’t deserve this kind of attacks.


  • Increased chances of adoption for regulated/critical applications (security/healthcare/finances). Unlike traditional software components (apps, API, databases), it is extremely difficult to guarantee the normal operation of machine learning models in production due to their black-box nature. “It works on our dataset” is simply not enough for regulatory agencies. A detailed descriptions of how the system works would certainly prevent this kind of non-reliable operation and, as a result, help obtaining the regulatory license for using a model for high-risk applications like autonomous driving, diagnostics in healthcare and financial sector, to name a few. It also helps with gaining the trust of business users and upper management.

How is explainable AI currently being used?

Currently, there are two major approaches to explain AI models: either build them as explainable from the start (so called “glass box” models) or apply some certain techniques to already built non-explainable black box models in order to explain them. Here are the main approaches used for both:

Glass box modelsBlack box explainers
Explainable Boosting SHAP Kernel Explainer
Decision TreesLIME
Decision Rule ListMorris Sensitivity Analysis
Linear RegressionPartial Dependence

No matter which of the actual methods is applied, there are currently two basic ways how these explainable AI models are used today:

  1. As a model debugging technique. Currently, ML practitioners rely on years of experience and intuition when seeing suboptimal results after running an experiment. They see bad predictions, they look at some graphs and numbers and make an intuition-based decision on what’s worth of trying in the next iteration, i.e. which data to to add, which augmentation techniques to use, which training parameters to change, which structural changes to apply to the model, etc. Having an intuitive and transparent way of introspecting why a model made those wrong predictions helps eliminate excessive experimentation by narrowing down the reason for them happening in the first place.
  2. Feature attribution/importance/selection. Contrary to the popular belief that more data is better than no data, I prefer no data over useless data. Very often dataset used for training contain recorded features with are not useful for model prediction, so the question is which features are relevant? Explaining a trained model would allow identifying irrelevant features by their weight – everything close to zero doesn’t contribute almost at all and it could be safely removed from the input. The advantage of using explainable AI rather than simple statistical techniques (correlation, LDA, etc) is that AI models can capture more complex, nonlinear dependencies than legacy ML techniques. Currently majority of the techniques has a very intuitive implementation (even built-in API support) on random forests and other ML techniques, while deep neural networks usually use layer activation or various gradient-based methods to determine feature importance. Of course, the useful compositional features (edges, corners, shapes) are not available until later stages of the signal propagation, i.e. deeper layers, so the price we pay for higher model expressivity is harder to define features and their importance.

How will explainable AI be used in the future?

  • Transparent model design. Forcing trained models to be explainable will effectively create “knowledge bottlenecks”, which could be used to inject knowledge from outside by tuning those decision points manually. For example, an explainable model predicting the beauty of a human face might discover symmetry, skin color, age, hair color, hair style and eyeball size as some explainable characteristics of a human face. When predicting the beauty it would try to fit these parameters to the dataset given during the training by assigning weights to each one them in order to describe the way they contribute to the overall beauty. I.e. positive high numbers mean the parameter correlates to the beauty and negative numbers mean it has a reverse correlation (reduces beauty). If we train the model and, by pure chance, our dataset was not balanced and contained objectively beautiful red-headed people, model might deduct that all red-headed people are beautiful. If we wanted to tune the model to, let’s say, ignore the hair color we could manually tune the wight of the hair color to zero, which effectively allows us to help the learning algorithm design the final model.

  • Explanation of the model becomes a final product. In a usual AI flow models are the product. They are used as a pipeline element to ingest some data and spit out some other data. Explainable AI flips this flow to its head by allowing the explanation of the trained model to be used as a data analysis technique. What training model does is essentially find complex, nonlinear correlations of input data to the desired output and captures those complex correlations. Being able to explain those correlations means we are able to statistically explain and analyse our dataset, which ultimately allows us to make informed decisions. A simple example would be a model for determining the bank loan risk trained on a dataset of loan applicants and the result of risk assessment. Training an explainable AI model would allow applicants to see not only why are they rejected, but also learn what’s their quickest way to pass the barrier for acceptance (increase the income by 1K or show stable income for one more year). 

  • Improve and objectify opinions of controversial topics. An MIT experiment Moral Machine (https://www.moralmachine.net/) poses an interesting question – what is morale and what does it mean to make ethically correct decisions? The website collects series of human answers to tricky question in catastrophic autonomous driving scenarios, where an autonomous car must decide which of the two clearly defined catastrophic scenarios to pick. This forces a human answering the question to assign a relative value to people of various genders, ethnicities, ages and professions. In a nutshell, it challenges the human understanding of ethics and morale. As soon as the last technical obstacle for autonomous driving is resolved, the next question becomes liability and morale – how to teach an algorithm to do the right thing if we, humans, can’t objectively decide what’s right and what’s wrong? So far there have been reports of answers clearly correlating to the culture of the human providing the answer, which is expected in one way and discouraging in another. If we ignore this and assume a single, accurate model of human understanding of morale and ethics can be built using provided QA dataset, the question is how do we use it? If it’s a black-box model we can apply it to any scenario and get the accurate prediction, which might be useful in court trials, insurance companies, healthcare and other areas. But a glass-box model has a lot higher value as an factual explainer of the mental model humans apply to the problem.
No podemos decirle al coche autónomo qué vida salvar: no lo sabemos

The Moral Machine

Final thoughts

The field of explainable AI is still relatively new and is approached differently for traditional machine learning and gradient-based deep learning. Machine learning explainers usually try to correlate each feature to the output by observing the effect varying the input feature has on the output. This set of simple techniques is useful for unstructured dataset and won’t do much on images, videos and NLP problems. Explainability on these data modalities is based majorly on visualising the attention of the model while performing predictions, which is nothing else than the highlight of the areas of the input with the highest contribution to the output. Some applications visualise carefully chosen layer activations in a clever way to explain what did the model “focus on” the most while making a prediction.

As you may correctly assume, all of these approaches are still very segmented and limited to either particular types of models or certain data modalities, which makes this a very active area of AI research in the world. The importance of explainability in AI is rising each day as the models are moving from research labs into the real world and getting increasing amount of influence over our lives.

Leave a Comment