Convergence of gradient descent for learning linear neural networks

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Nguegnang, Gabin Maxime ORCID: https://orcid.org/0000-0002-6310-075X; Rauhut, Holger ORCID: https://orcid.org/0000-0003-4750-5092 und Terstiege, Ulrich (18. Juli 2024): Convergence of gradient descent for learning linear neural networks. In: Advances in Continuous and Discrete Models, Bd. 2024, 23 [PDF, 2MB]

Vorschau

Creative Commons: Namensnennung 4.0 (CC-BY)

Veröffentlichte Version

DOI: 10.1186/s13662-023-03797-x

Abstract

We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the stepsizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers, we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.

Dokumententyp:	Zeitschriftenartikel
Fakultät:	Mathematik, Informatik und Statistik > Mathematik > Lehrstuhl für Mathematik der Informationsverarbeitung
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik
URN:	urn:nbn:de:bvb:19-epub-127184-1
ISSN:	2731-4235
Sprache:	Englisch
Dokumenten ID:	127184
Datum der Veröffentlichung auf Open Access LMU:	30. Jun. 2025 06:38
Letzte Änderungen:	11. Jul. 2025 19:09

Dokument bearbeiten