ORCID: https://orcid.org/0000-0003-4750-5092 und Terstiege, Ulrich
(2024):
Convergence of gradient descent for learning linear neural networks.
In: Advances in Continuous and Discrete Models, 23
[PDF, 2MB]
Abstract
We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the stepsizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers, we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.
| Item Type: | Journal article |
|---|---|
| Faculties: | Mathematics, Computer Science and Statistics > Mathematics > Chair of Mathematics of Information Processing |
| Subjects: | 500 Science > 510 Mathematics |
| URN: | urn:nbn:de:bvb:19-epub-127245-0 |
| ISSN: | 2731-4235 |
| Item ID: | 127245 |
| Date Deposited: | 31. Jul 2025 14:17 |
| Last Modified: | 21. Nov 2025 11:49 |
