Learning deep linear neural networks

www.lmu.de | UB | Blättern | Hilfe

Zur erweiterten Suche

English

Zur erweiterten Suche

Bah, Bubacarr; Rauhut, Holger ORCID: https://orcid.org/0000-0003-4750-5092; Terstiege, Ulrich und Westdickenberg, Michael (2021): Learning deep linear neural networks. Riemannian gradient flows and convergence to global minimizers. In: Information and Inference, Bd. 11, Nr. 1: S. 307-353

Volltext auf 'Open Access LMU' nicht verfügbar.

DOI: 10.1093/imaiai/iaaa039

Abstract

We study the convergence of gradient flows related to learning deep linear neural networks (where the activation function is the identity map) from data. In this case, the composition of the network layers amounts to simply multiplying the weight matrices of all layers together, resulting in an overparameterized problem. The gradient flow with respect to these factors can be re-interpreted as a Riemannian gradient flow on the manifold of rank-r matrices endowed with a suitable Riemannian metric. We show that the flow always converges to a critical point of the underlying functional. Moreover, we establish that, for almost all initializations, the flow converges to a global minimum on the manifold of rank k matrices for some k<r⁠.

Dokumententyp:	Zeitschriftenartikel
Keywords:	Riemannian; gradient flow; manifolds; deep learning; neural networks
Fakultät:	Mathematik, Informatik und Statistik > Mathematik > Lehrstuhl für Mathematik der Informationsverarbeitung
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik
ISSN:	2049-8772
Sprache:	Englisch
Dokumenten ID:	125104
Datum der Veröffentlichung auf Open Access LMU:	28. Apr. 2025 12:23
Letzte Änderungen:	21. Nov. 2025 11:50

Dokument bearbeiten