Artificial intelligence (AI) is a branch of computer science that deals with the synthesis of human intelligence. The field draws from different disciplines, including psychology, neurobiology, behavioral science, and engineering. While the objective of constructing an intelligent agent who exhibits broad, creative problem-solving capabilities comparable to humans appears to be out of reach for the foreseeable future, AI applications are already part of our everyday life. Popular examples include, but are not limited to, intelligent assistants (e.g. Apple’s Siri or Amazon’s Alexa), object recognition (Instagram’s automated photo description), and intelligent recommendations (e.g. Netflix’s movie recommendations).
At their core, the most powerful AI applications, such as Deep Convolutional Neural Networks, or Recurrent Neural Networks, use large amounts of complex training data to recognize hidden patterns and ultimately make highly accurate predictions about uncertain (future) states. The high predictive performance of state-of-the-art machine learning models frequently comes at the expense of transparency and interpretability of predictions, as machines cannot convey human-interpretable information about why they come up with specific outcomes. That's the reason machine learning applications are often labeled as black boxes whose workings are neither entirely understood by expert designers nor human users. The lack of interpretability can be concerning for several reasons.
First, the opacity of machine-generated outputs broadly creates accountability, responsibility, and auditing problems. This naturally impedes the possibility of detecting biased or discriminatory outcomes and renders navigating questions about liability difficult. Second, when human developers and users do not receive explanations about the inner reasoning of AI applications, this deprives the opportunity to improve the system’s design, but also learn new insights from the machine that can improve human problem-solving capabilities. The latter aspect, in particular, is a substantial obstacle to AI’s ability to enhance economic efficiency and human welfare by revealing new domain knowledge hidden in complex Big Data. Third, the black-box nature of machine learning applications can have a negative impact on people’s trust in its performance, eventually hampering their acceptance.
The objective of eXplainable Artificial Intelligence (XAI) is to mitigate the outlined problems associated with the black box nature by explaining the processing steps of the AI between input and output in a way that is comprehensible to humans. There are several approaches to cracking open the black box. On a high level, one can distinguish between intrinsic explanatory methods, and post-hoc explanatory methods.¹
Intrinsic methods effectively are models that are inherently self-explanatory and provide an immediate human-readable interpretation of how they transform certain data inputs into outputs. In other words, these are relatively simple models whose inner structure humans can comprehend without additional transformations. Post-hoc methods, on the other hand, revolve around achieving the interpretability of a given complex machine learning model via the construction of a second, simpler model (called surrogate model) that approximates the behavior of the more complex model but is interpretable for humans.
Considering that AI applications are typically characterized by high scalability, it is encouraging that researchers, policy-makers, and practitioners alike are increasingly calling for standards regarding explanations about how and why an AI application produces a specific output. A Growing number of regulatory efforts, such as Europe’s General DataProtection Regulation or the Free Flow of Non-Personal Data Regulation, advocate that people interacting with AI applications, especially those affected by them, have a right to an explanation.
While the move toward fostering the interpretability of AI applications is arguably desirable from various points of view, there are also potential downsides. Rendering systems human interpretable may not always be possible without suffering considerable performance losses. In situations where the high accuracy of AI predictions is more important than high transparency (e.g., the correct detection of cancer), making AI systems interpretable may be undesirable. Another issue is privacy protection. Making systems more interpretable may sometimes reveal sensitive (personal) data, which certain stakeholders may strictly refuse or which is even legally prohibited. It is important to consider that explanations are also not correct and can be (intentionally) misleading.
This may cause users and targets to be more willing to rely on and follow AI outputs, even though it is not correct. Similarly, the observers of explanations may infer insights about the relation between the AI system’s in and outputs that allow them to game the system (e.g., understanding how not to be detected when committing tax fraud) or adapt their perceptions undesirably (e.g. perceiving that gender is a determinant for a person’s propensity to work hard).
Finally, as with many methodologies and requirements, one size will most likely not fit all regarding these techniques. Different stakeholders require different explanations. Developers and individuals responsible for the maintenance of AI applications, for instance, require a more detailed explanation of the specific inner mathematical computations than the end-users who require a high-level explanation of the most relevant determinants of outputs.
¹For more information see: Bauer, K., Hinz, O.,van der Aalst, W., & Weinhardt, C. (2021). Expl (AI) n It to Me–Explainable AI and Information Systems Research