Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/179411
Title: Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function
Authors: Lim, Dong-Young
Neufeld, Ariel
Sabanis, Sotirios
Zhang, Ying
Keywords: Mathematical Sciences
Issue Date: 2024
Source: Lim, D., Neufeld, A., Sabanis, S. & Zhang, Y. (2024). Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function. IMA Journal of Numerical Analysis, 44(3), 1464-1559. https://dx.doi.org/10.1093/imanum/drad038
Project: NTU NAP 
Journal: IMA Journal of Numerical Analysis
Abstract: We consider nonconvex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a nonasymptotic analysis for the tamed unadjusted stochastic Langevin algorithm (TUSLA) introduced in Lovas et al. (2020). In particular, we establish nonasymptotic error bounds for the TUSLA algorithm in Wasserstein-1 and Wasserstein-2 distances. The latter result enables us to further derive nonasymptotic estimates for the expected excess risk. To illustrate the applicability of the main results, we consider an example from transfer learning with ReLU neural networks, which represents a key paradigm in machine learning. Numerical experiments are presented for the aforementioned example, which support our theoretical findings. Hence, in this setting, we demonstrate both theoretically and numerically that the TUSLA algorithm can solve the optimization problem involving neural networks with ReLU activation function. Besides, we provide simulation results for synthetic examples where popular algorithms, e.g., ADAM, AMSGrad, RMSProp and (vanilla) stochastic gradient descent algorithm, may fail to find the minimizer of the objective functions due to the super-linear growth and the discontinuity of the corresponding stochastic gradient, while the TUSLA algorithm converges rapidly to the optimal solution. Moreover, we provide an empirical comparison of the performance of TUSLA with popular stochastic optimizers on real-world datasets, as well as investigate the effect of the key hyperparameters of TUSLA on its performance.
URI: https://hdl.handle.net/10356/179411
ISSN: 0272-4979
DOI: 10.1093/imanum/drad038
Schools: School of Physical and Mathematical Sciences 
Rights: © The Author(s) 2023. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.
Fulltext Permission: none
Fulltext Availability: No Fulltext
Appears in Collections:SPMS Journal Articles

SCOPUSTM   
Citations 50

1
Updated on May 5, 2025

Page view(s)

87
Updated on May 7, 2025

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.