Compare-xAI: Toward Unifying Functional Testing Methods for Post-hoc XAI Algorithms into a Multi-dimensional Benchmark

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Belaid, Mohamed Karim; Bornemann, Richard; Rabus, Maximilian; Krestel, Ralf und Hüllermeier, Eyke ORCID: https://orcid.org/0000-0002-9944-4108 (Juli 2023): Compare-xAI: Toward Unifying Functional Testing Methods for Post-hoc XAI Algorithms into a Multi-dimensional Benchmark. World Conference on Explainable Artificial Intelligence (xAI 2023), Lisboa, Portugal, 26-28 July 2023. Longo, Luca (Hrsg.): Cham: Springer Nature Switzerland. S. 88-109 [PDF, 507kB]

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

Entwurf

DOI: 10.1007/978-3-031-44067-0_5

Abstract

In recent years, Explainable AI (xAI) attracted a lot of attention as various countries turned explanations into a legal right. xAI algorithms enable humans to understand the underlying models and explain their behavior, leading to insights through which the models can be analyzed and improved beyond the accuracy metric by, e.g., debugging the learned pattern and reducing unwanted biases. However, the widespread use of xAI and the rapidly growing body of published research in xAI have brought new challenges. A large number of xAI algorithms can be overwhelming and make it difficult for practitioners to choose the correct xAI algorithm for their specific use case. This problem is further exacerbated by the different approaches used to assess novel xAI algorithms, making it difficult to compare them to existing methods. To address this problem, we introduce Compare-xAI, a benchmark that allows for a direct comparison of popular xAI algorithms with a variety of different use cases. We propose a scoring protocol employing a range of functional tests from the literature, each targeting a specific end-user requirement in explaining a model. To make the benchmark results easily accessible, we group the tests into four categories (fidelity, fragility, stability, and stress tests). We present results for 13 xAI algorithms based on 11 functional tests. After analyzing the findings, we derive potential solutions for data science practitioners as workarounds to the found practical limitations. Finally, Compare-xAI is a tentative to unify systematic evaluation and comparison methods for xAI algorithms with a focus on the end-user's requirements. The code is made available at:

Dokumententyp:	Konferenzbeitrag (Paper)
Fakultät:	Mathematik, Informatik und Statistik > Informatik > Künstliche Intelligenz und Maschinelles Lernen
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 000 Informatik, Wissen, Systeme
URN:	urn:nbn:de:bvb:19-epub-107572-6
Ort:	Cham
Dokumenten ID:	107572
Datum der Veröffentlichung auf Open Access LMU:	13. Dez. 2023 14:44
Letzte Änderungen:	22. Nov. 2024 10:27

Dokument bearbeiten