Short Paper - Quadratic minimization: from conjugate gradient to an adaptive Polyak’s momentum method with Polyak step-sizes

Baptiste Goujaud; Adrien Taylor; Aymeric Dieuleveut

doi:10.5802/ojmo.36

Baptiste Goujaud ¹; Adrien Taylor ²; Aymeric Dieuleveut ¹

¹ CMAP, Ecole Polytechnique, Institut Polytechnique de Paris
² INRIA, Ecole Normale Supérieure, PSL Research University, Paris

Open Journal of Mathematical Optimization, Volume 5 (2024), article no. 9, 10 p.

Abstract

In this work, we propose an adaptive variation on the classical Heavy-ball method for convex quadratic minimization. The adaptivity crucially relies on so-called “Polyak step-sizes”, which consists of using the knowledge of the optimal value of the optimization problem at hand instead of problem parameters such as a few eigenvalues of the Hessian of the problem. This method happens to also be equivalent to a variation of the classical conjugate gradient method, and thereby inherits many of its attractive features, including its finite-time convergence, instance optimality, and its worst-case convergence rates.

The classical gradient method with Polyak step-sizes is known to behave very well in situations in which it can be used, and the question of whether incorporating momentum in this method is possible and can improve the method itself appeared to be open. We provide a definitive answer to this question for minimizing convex quadratic functions, an arguably necessary first step for developing such methods in more general setups.

Received: 2024-02-16
Revised: 2024-09-03
Accepted: 2024-09-04
Published online: 2024-11-13

DOI: 10.5802/ojmo.36

Keywords: Optimization, Quadratic, Conjugate Gradient, Heavy-ball, Polyak step-sizes, Optimality

Author's affiliations:

Baptiste Goujaud ¹; Adrien Taylor ²; Aymeric Dieuleveut ¹

¹ CMAP, Ecole Polytechnique, Institut Polytechnique de Paris
² INRIA, Ecole Normale Supérieure, PSL Research University, Paris

License:

CC-BY 4.0

Copyrights: The authors retain unrestricted copyrights and publishing rights

@article{OJMO_2024__5__A7_0,
     author = {Baptiste Goujaud and Adrien Taylor and Aymeric Dieuleveut},
     title = {Short {Paper} - {Quadratic} minimization: from conjugate gradient to an adaptive {Polyak{\textquoteright}s} momentum method with {Polyak} step-sizes},
     journal = {Open Journal of Mathematical Optimization},
     eid = {9},
     pages = {1--10},
     publisher = {Universit\'e de Montpellier},
     volume = {5},
     year = {2024},
     doi = {10.5802/ojmo.36},
     language = {en},
     url = {https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.36/}
}

TY  - JOUR
AU  - Baptiste Goujaud
AU  - Adrien Taylor
AU  - Aymeric Dieuleveut
TI  - Short Paper - Quadratic minimization: from conjugate gradient to an adaptive Polyak’s momentum method with Polyak step-sizes
JO  - Open Journal of Mathematical Optimization
PY  - 2024
SP  - 1
EP  - 10
VL  - 5
PB  - Université de Montpellier
UR  - https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.36/
DO  - 10.5802/ojmo.36
LA  - en
ID  - OJMO_2024__5__A7_0
ER  -

%0 Journal Article
%A Baptiste Goujaud
%A Adrien Taylor
%A Aymeric Dieuleveut
%T Short Paper - Quadratic minimization: from conjugate gradient to an adaptive Polyak’s momentum method with Polyak step-sizes
%J Open Journal of Mathematical Optimization
%D 2024
%P 1-10
%V 5
%I Université de Montpellier
%U https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.36/
%R 10.5802/ojmo.36
%G en
%F OJMO_2024__5__A7_0

Baptiste Goujaud; Adrien Taylor; Aymeric Dieuleveut. Short Paper - Quadratic minimization: from conjugate gradient to an adaptive Polyak’s momentum method with Polyak step-sizes. Open Journal of Mathematical Optimization, Volume 5 (2024), article  no. 9, 10 p. doi : 10.5802/ojmo.36. https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.36/

References
Cited by

[1] Jason M. Altschuler; Pablo A. Parrilo Acceleration by stepsize hedging I: Multi-step descent and the silver stepsize schedule (2023) | arXiv

[2] Mathieu Barré; Adrien Taylor; Alexandre d’Aspremont Complexity guarantees for Polyak steps with momentum, Conference on Learning Theory, PMLR (2020), pp. 452-478

[3] Jonathan Barzilai; Jonathan M. Borwein Two-point step size gradient methods, IMA J. Numer. Anal., Volume 8 (1988) no. 1, pp. 141-148 | DOI | MR | Zbl

[4] Raphaël Berthier; Francis Bach; Pierre Gaillard Accelerated gossip in networks of given dimension using Jacobi polynomial iterations, SIAM J. Math. Data Sci., Volume 2 (2020) no. 1, pp. 24-47 | DOI | MR | Zbl

[5] Joseph-Frédéric Bonnans; Jean-Charles Gilbert; Claude Lemaréchal; Claudia A. Sagastizábal Numerical optimization: theoretical and practical aspects, Universitext, Springer, 2006 | MR

[6] Alexandre d’Aspremont; Damien Scieur; Adrien Taylor Acceleration Methods, Found. Trends Optim., Volume 5 (2021) no. 1-2, pp. 1-245 | DOI

[7] Ryan D’Orazio; Nicolas Loizou; Issam Laradji; Ioannis Mitliagkas Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic Polyak stepsize, Transactions on Machine Learning Research (TMLR) (2021)

[8] Bernd Fischer Polynomial based iteration methods for symmetric linear systems, Society for Industrial and Applied Mathematics, 2011 | DOI | MR

[9] Gene H. Golub; Richard S. Varga Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods, Numer. Math., Volume 3 (1961) no. 1, pp. 157-168 | DOI | MR | Zbl

[10] Baptiste Goujaud; Damien Scieur; Aymeric Dieuleveut; Adrien Taylor; Fabian Pedregosa Super-acceleration with cyclical step-sizes, International Conference on Artificial Intelligence and Statistics, PMLR (2022), pp. 3028-3065

[11] Baptiste Goujaud; Adrien Taylor; Aymeric Dieuleveut Provable non-accelerations of the heavy-ball method (2023) | arXiv

[12] Robert M. Gower; Mathieu Blondel; Nidham Gazagnadou; Fabian Pedregosa Cutting Some Slack for SGD with Adaptive Polyak Stepsizes (2022) | arXiv

[13] William W. Hager; Hongchao Zhang A survey of nonlinear conjugate gradient methods, Pac. J. Optim., Volume 2 (2006) no. 1, pp. 35-58 | MR | Zbl

[14] Elad Hazan; Sham Kakade Revisiting the Polyak step size (2019) | arXiv

[15] Nicolas Loizou; Sharan Vaswani; Issam Laradji; Simon Lacoste-Julien Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence, International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR (2021), pp. 1306-1314

[16] Yura Malitsky; Konstantin Mishchenko Adaptive Gradient Descent without Descent, International Conference on Machine Learning (ICML), JMLR.org, PMLR (2020), pp. 6702-6712

[17] Yurii Nesterov A method of solving a convex programming problem with convergence rate $O (1 / k^{2})$ , Sov. Math., Dokl., Volume 27 (1983) no. 2, pp. 372-376 | Zbl

[18] Yurii Nesterov Introductory Lectures on Convex Optimization, Applied Optimization, 87, Springer, 2003 | MR | Zbl

[19] Jorge Nocedal; Stephen J. Wright Numerical optimization, Springer, 1999 | DOI | MR

[20] Fabian Pedregosa A Hitchhiker’s Guide to Momentum, http://fa.bianp.net/blog/2021/hitchhiker/, 2021 http://fa.bianp.net/blog/2020/polyopt/

[21] Boris T. Polyak Some methods of speeding up the convergence of iteration methods, U.S.S.R. Comput. Math. Math. Phys., Volume 4 (1964) no. 5, pp. 1-17 | DOI | Zbl

[22] Boris T. Polyak Introduction to optimization, Optimization Software New York, 1987

Cited by Sources: