The Intriguing Relation Between Counterfactual Explanations and Adversarial Examples

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Freiesleben, Timo (2021): The Intriguing Relation Between Counterfactual Explanations and Adversarial Examples. In: Minds and Machines, Bd. 32: S. 77-109 [PDF, 1MB]

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

DOI: 10.1007/s11023-021-09580-9

Abstract

The same method that creates adversarial examples (AEs) to fool image-classifiers can be used to generate counterfactual explanations (CEs) that explain algorithmic decisions. This observation has led researchers to consider CEs as AEs by another name. We argue that the relationship to the true label and the tolerance with respect to proximity are two properties that formally distinguish CEs and AEs. Based on these arguments, we introduce CEs, AEs, and related concepts mathematically in a common framework. Furthermore, we show connections between current methods for generating CEs and AEs, and estimate that the fields will merge more and more as the number of common use-cases grows.

Dokumententyp:	Zeitschriftenartikel
Fakultät:	Philosophie, Wissenschaftstheorie und Religionswissenschaft > Munich Center for Mathematical Philosophy (MCMP)
Themengebiete:	100 Philosophie und Psychologie > 100 Philosophie
URN:	urn:nbn:de:bvb:19-epub-90948-4
ISSN:	0924-6495
Sprache:	Englisch
Dokumenten ID:	90948
Datum der Veröffentlichung auf Open Access LMU:	09. Feb. 2022 08:50
Letzte Änderungen:	11. Jan. 2023 15:20

Dokument bearbeiten