



Until recently, accountability was primarily seen as an important but narrow requirement towards the end of the AI ​​model development process. Explainability is now seen as a multi-layered requirement that provides value throughout the machine learning life cycle.

In addition, the explainability toolkit not only provides basic transparency in how machine learning models make decisions, but also provides robustness, fairness, conceptual integrity, stability, and more. We are now also performing a wide range of quality assessments.

Given the increasing importance of accountability, organizations wishing to adopt large-scale machine learning, especially those with high stakes or regulated use cases, have an accountability approach. More attention should be paid to the quality of the solution.

There are many open source options available to address certain aspects of the accountability issue. However, it is difficult to integrate these tools into a consistent enterprise-grade solution that is robust, internally consistent, and works well across models and development platforms.

An enterprise-grade accountability solution must meet four key tests:

Does it explain the important consequences? Is it internally consistent? Can it be done reliably on a large scale? Can you live up to the rapidly evolving expectations? Does it explain the important consequences?

Machine learning models are increasingly being used to influence and determine critical outcomes in people’s lives, such as loan approval, job seekers, and admissions, a explainable approach. Is essential to provide a reliable and reliable explanation of how the model reaches the decision. ..

Explaining a classification decision (yes / no decision) is often very different from explaining a probabilistic result or a model’s risk score. Why was Jane denied the loan? It’s a fundamentally different question than why Jane received a risk score of 0.63.

Conditional methods like TreeSHAP are accurate for model scores, but can be very inaccurate for classification results. As a result, while useful for debugging basic models, it cannot explain human-understandable results of model scores, such as classification decisions.

Instead of TreeSHAP, consider the effect of quantitative input, QII. QII simulates breaking the correlation between model features to measure changes to model output. This technique is more accurate for a wider range of results, such as more influential classification results, as well as model scores and probabilities.

Result-driven explanations are very important for questions surrounding unjustified prejudice. For example, if the model is really unbiased, the answer to the question is why was Jane denied the loan compared to all approved women? Why was Jane denied the loan compared to all the approved men?

Is it internally consistent?

Open source products for AI accountability are often limited in scope. For example, the Alibi library is built directly on top of SHAP, so it is automatically limited to model scores and probabilities. In search of a broader solution, some organizations have put together a fusion of narrow open source technologies. However, this approach can lead to inconsistent tools and can lead to inconsistent results for the same question.

A consistent accountability approach needs to be consistent along three aspects:

Scope of description (local and global): Detailed model evaluation and debugging capabilities are essential for deploying reliable machine learning. To perform root cause analysis, it is important to be based on a consistent and well-founded basis for explanation. If you use different techniques to generate local and global descriptions, you will not be able to trace the unexpected behavior of the description back to the root cause of the problem, and you will not have the opportunity to fix it. Underlying Model Types (Traditional Models and Neural Networks): A good explanatory framework is ideally a decision tree / forest, a logistic regression model, a gradient boost tree, as well as the entire machine learning model type of a neural network. Must work with. Network (RNN, CNN, Transformers). Machine learning lifecycle stages (development, validation, continuous monitoring): No need to delegate explanations to the last steps of the machine learning lifecycle. They serve as the backbone for quality checking of machine learning models in development and verification, and can also be used for continuous monitoring of models in production environments. For example, you can show how your model’s description changes over time to show whether your model is working with new or potentially out-of-distribution samples. For this reason, an explanatory toolkit that can be consistently applied throughout the machine learning life cycle is essential. Can it be done reliably on a large scale?

Descriptions, especially those that estimate Shapley values ​​such as SHAP and QII, are always approximate. All descriptions (except the duplicate of the model itself) lose some fidelity. If everything else is equal, faster description calculations can help you develop and deploy your model faster.

The QII framework can ensure (and actually) provide accurate explanations while adhering to the principles of a good explanation framework. However, scaling these calculations across different forms of hardware and model frameworks requires critical infrastructure support.

Properly and scalable implementation of these explanations can be an important challenge, even when calculating explanations via Shapley values. Common implementation issues include issues with how to handle correlation features, how to handle missing values, and how to select comparison groups. Subtle errors along these dimensions can have significantly different local or global descriptions.

Can you meet the rapidly evolving requirements?

The question of what constitutes a good explanation is evolving rapidly. On the one hand, the science that describes machine learning models (and the science that makes reliable assessments of model quality such as bias, stability, and conceptual integrity) is still evolving. Meanwhile, regulators around the world are building expectations for minimum standards of accountability and model quality. Expectations for explanation change as machine learning models begin to evolve in new industries and use cases.

Given this changing baseline, it is imperative that the accountability toolkits used by enterprises remain dynamic. It is important to understand the evolving needs and have dedicated R & D capabilities to adjust or enhance the toolkit to meet them.

Explainability of machine learning models is central to building trust in machine learning models and ensuring large-scale adoption. Achieving it with a medley of various open source options may seem attractive, but connecting them to a consistent, consistent, purposeful framework It’s still difficult. Companies looking to adopt large-scale machine learning need to spend the time and effort needed to find the right option for their needs.

Shayak Sen is Truera’s Chief Technology Officer and Co-Founder. Sen began building production-grade machine learning models over a decade ago and has conducted key research to make machine learning systems easier to explain, privacy compliant, and fair. He has a PhD. He holds a PhD in Computer Science from Carnegie Mellon University and a BTech in Computer Science from the Indian Institute of Technology in Delhi.

Anupam Datta, a professor of electrical and computer engineering at Carnegie Mellon University and Principal Scientist at Truera, and Divya Gopinath, a research engineer at Truera, contributed to this article.

