Abstract
Despite the widespread success of deep learning in various applications, neural network theory has been lagging behind. The choice of the activation function plays a critical role in the expressivity of a neural network but for reasons that are not yet fully understood. While the rectified linear unit (ReLU) is currently one of the most popular activation functions, ReLU squared has only recently been empirically shown to be pivotal in producing consistently superior results for state-of-the-art deep learning tasks (So et al., 2021). To analyze the expressivity of neural networks with ReLU powers, we employ the novel framework of Gribonval et al. (2022) based on the classical concept of approximation spaces. We consider the class of functions for which the approximation error decays at a sufficiently fast rate as network complexity, measured by the number of weights, increases. We show that when approximating sufficiently smooth functions that cannot be represented by sufficiently low-degree polynomials, networks with ReLU powers need less depth than those with ReLU. Moreover, if they have the same depth, networks with ReLU powers can have potentially faster approximation rates. Lastly, our computational experiments on approximating the Rastrigin and Ackley functions with deep neural networks showed that ReLU squared and ReLU cubed networks consistently outperform ReLU networks.
Dokumententyp: | Zeitschriftenartikel |
---|---|
Keywords: | ReLU powers ; Deep neural networks ; Approximation spaces |
Fakultät: | Mathematik, Informatik und Statistik > Mathematik > Lehrstuhl für Mathematik der Informationsverarbeitung |
Themengebiete: | 000 Informatik, Informationswissenschaft, allgemeine Werke > 000 Informatik, Wissen, Systeme
500 Naturwissenschaften und Mathematik > 510 Mathematik |
ISSN: | 0893-6080 |
Dokumenten ID: | 127300 |
Datum der Veröffentlichung auf Open Access LMU: | 07. Aug. 2025 07:07 |
Letzte Änderungen: | 07. Aug. 2025 07:07 |