ORCID: https://orcid.org/0000-0002-6310-075X; Rauhut, Holger
ORCID: https://orcid.org/0000-0003-4750-5092 und Terstiege, Ulrich
ORCID: https://orcid.org/0000-0003-4750-5092
(18. Juli 2024):
Convergence of gradient descent for learning linear neural networks.
In: Advances in Continuous and Discrete Models, Bd. 2024, 23
[PDF, 2MB]
![s13662-023-03797-x.pdf [thumbnail of s13662-023-03797-x.pdf]](https://epub.ub.uni-muenchen.de/style/images/fileicons/application_pdf.png)

Abstract
We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the stepsizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers, we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.
Dokumententyp: | Zeitschriftenartikel |
---|---|
Fakultät: | Mathematik, Informatik und Statistik > Mathematik > Lehrstuhl für Mathematik der Informationsverarbeitung |
Themengebiete: | 500 Naturwissenschaften und Mathematik > 510 Mathematik |
URN: | urn:nbn:de:bvb:19-epub-127184-1 |
ISSN: | 2731-4235 |
Sprache: | Englisch |
Dokumenten ID: | 127184 |
Datum der Veröffentlichung auf Open Access LMU: | 30. Jun. 2025 06:38 |
Letzte Änderungen: | 30. Jun. 2025 06:38 |