Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

Jiang, Xiaowen and Stich, Sebastian U.
(2023) Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction.
In: Conference on Neural Information Processing Systems, 11.12.2023-16.12.2023, New Orleans, USA.
Conference: NeurIPS Conference on Neural Information Processing Systems

[img] Text
14329_adaptive_sgd_with_polyak_steps.pdf - Published Version

Download (1MB)

Abstract

The recently proposed stochastic Polyak stepsize (SPS) and stochastic linesearch (SLS) for SGD have shown remarkable effectiveness when training overparameterized models. However, two issues remain unsolved in this line of work. First, in non-interpolation settings, both algorithms only guarantee convergence to a neighborhood of a solution which may result in a worse output than the initial guess. While artificially decreasing the adaptive stepsize has been proposed to address this issue (Orvieto et al. 2022), this approach results in slower convergence rates under interpolation. Second, intuitive line-search methods equipped with variance-reduction (VR) fail to converge (Dubois-Taine et al. 2022). So far, no VR methods successfully accelerate these two stepsizes with a convergence guarantee. In this work, we make two contributions: Firstly, we propose two new robust variants of SPS and SLS, called AdaSPS and AdaSLS, which achieve optimal asymptotic rates in both strongly-convex or convex and interpolation or noninterpolation settings, except for the case when we have both strong convexity and non-interpolation. AdaSLS requires no knowledge of problem-dependent parameters, and AdaSPS requires only a lower bound of the optimal function value as input. Secondly, we propose a novel VR method that can use Polyak stepsizes or line-search to achieve acceleration. When it is equipped with AdaSPS or AdaSLS, the resulting algorithms obtain the optimal rate for optimizing convex smooth functions. Finally, numerical experiments on synthetic and real datasets validate our theory and demonstrate the effectiveness and robustness of our algorithms.

Actions

Actions (login required)

View Item View Item