Memorization-Dilation: Modeling Neural Collapse Under Noise

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Nguyen, Duc Anh; Levie, Ron; Lienen, Julian ORCID: https://orcid.org/0000-0003-2162-8107; Hüllermeier, Eyke ORCID: https://orcid.org/0000-0002-9944-4108 und Kutyniok, Gitta ORCID: https://orcid.org/0000-0001-9738-2487 (Mai 2023): Memorization-Dilation: Modeling Neural Collapse Under Noise. The Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1-5 May, 2023. [PDF, 510kB]

[thumbnail of 3650_memorization_dilation_modeling.pdf]

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

Veröffentlichte Version

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

Ergänzendes Material

Creative Commons: Namensnennung-Nicht Kommerziell-Keine Bearbeitung 4.0 (CC-BY-NC-SA)

Ergänzendes Material

Externer Volltext: https://openreview.net/forum?id=cJWxqmmDL2b

Abstract

The notion of neural collapse refers to several emergent phenomena that have been empirically observed across various canonical classification problems. During the terminal phase of training a deep neural network, the feature embedding of all examples of the same class tend to collapse to a single representation, and the features of different classes tend to separate as much as possible. Neural collapse is often studied through a simplified model, called the layer-peeled model, in which the network is assumed to have ``infinite expressivity'' and can map each data point to any arbitrary representation. In this work we study a more realistic variant of the layer-peeled model, which takes the positivity of the features into account. Furthermore, we extend this model to also incorporate the limited expressivity of the network. Empirical evidence suggests that the memorization of noisy data points leads to a degradation (dilation) of the neural collapse. Using a model of the memorization-dilation (M-D) phenomenon, we show one mechanism by which different losses lead to different performances of the trained network on noisy data. Our proofs reveal why label smoothing, a modification of cross-entropy empirically observed to produce a regularization effect, leads to improved generalization in classification tasks.

Dokumententyp:	Konferenzbeitrag (Paper)
Fakultät:	Mathematik, Informatik und Statistik > Mathematik > Professur für Mathematische Grundlagen des Verständnisses der künstlichen Intelligenz Mathematik, Informatik und Statistik > Informatik > Künstliche Intelligenz und Maschinelles Lernen
Themengebiete:	000 Informatik, Informationswissenschaft, allgemeine Werke > 000 Informatik, Wissen, Systeme
URN:	urn:nbn:de:bvb:19-epub-107490-4
Sprache:	Englisch
Dokumenten ID:	107490
Datum der Veröffentlichung auf Open Access LMU:	23. Okt. 2023 10:34
Letzte Änderungen:	20. Mai 2025 11:06
DFG:	Gefördert durch die Deutsche Forschungsgemeinschaft (DFG) - 160364472

Dokument bearbeiten