Abstract
Despite the widespread success of deep learning in various applications, neural network theory has been lagging behind. The choice of the activation function plays a critical role in the expressivity of a neural network but for reasons that are not yet fully understood. While the rectified linear unit (ReLU) is currently one of the most popular activation functions, ReLU squared has only recently been empirically shown to be pivotal in producing consistently superior results for state-of-the-art deep learning tasks (So et al., 2021). To analyze the expressivity of neural networks with ReLU powers, we employ the novel framework of Gribonval et al. (2022) based on the classical concept of approximation spaces. We consider the class of functions for which the approximation error decays at a sufficiently fast rate as network complexity, measured by the number of weights, increases. We show that when approximating sufficiently smooth functions that cannot be represented by sufficiently low-degree polynomials, networks with ReLU powers need less depth than those with ReLU. Moreover, if they have the same depth, networks with ReLU powers can have potentially faster approximation rates. Lastly, our computational experiments on approximating the Rastrigin and Ackley functions with deep neural networks showed that ReLU squared and ReLU cubed networks consistently outperform ReLU networks.
| Item Type: | Journal article |
|---|---|
| Keywords: | ReLU powers ; Deep neural networks ; Approximation spaces |
| Faculties: | Mathematics, Computer Science and Statistics > Mathematics > Chair of Mathematics of Information Processing |
| Subjects: | 000 Computer science, information and general works > 000 Computer science, knowledge, and systems 500 Science > 510 Mathematics |
| ISSN: | 0893-6080 |
| Item ID: | 127300 |
| Date Deposited: | 07. Aug 2025 07:07 |
| Last Modified: | 07. Aug 2025 07:07 |
