A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity

Serge Gratton; Sadok Jerad; Philippe L. Toint

doi:10.5802/ojmo.41

Serge Gratton ¹; Sadok Jerad ¹; Philippe L. Toint ²

¹ Université de Toulouse, INP, IRIT, Toulouse, France.
² NAXYS, University of Namur, Namur, Belgium.

Open Journal of Mathematical Optimization, Volume 6 (2025), article no. 5, 24 p.

Abstract

A fully stochastic pth-order adaptive-regularization method for unconstrained nonconvex optimization is presented which never computes the objective-function value, but yet achieves the optimal $\mathcal{O}(\epsilon ^{-(p+1)/p})$ complexity bound for finding first-order critical points. When stochastic gradients and Hessians are considered, we recover the optimal $\mathcal{O}\left(\epsilon ^{-3/2}\right)$ bound for finding first-order critical points. The method is noise-tolerant and the inexactness conditions required for convergence depend on the history of past steps. Applications to cases where derivative evaluation is inexact and to minimization of finite sums by sampling are discussed. Numerical experiments on large binary classification problems illustrate the potential of the new method.

Received: 2024-07-11
Revised: 2025-01-15
Accepted: 2025-04-22
Published online: 2025-05-20

DOI: 10.5802/ojmo.41

Keywords: stochastic optimization, adaptive regularization methods, evaluation complexity, Objective-Function-Free-Optimization (OFFO), nonconvex optimization.

Author's affiliations:

Serge Gratton ¹; Sadok Jerad ¹; Philippe L. Toint ²

¹ Université de Toulouse, INP, IRIT, Toulouse, France.
² NAXYS, University of Namur, Namur, Belgium.

License:

CC-BY 4.0

Copyrights: The authors retain unrestricted copyrights and publishing rights

@article{OJMO_2025__6__A5_0,
     author = {Serge Gratton and Sadok Jerad and Philippe L. Toint},
     title = {A {Stochastic} {Objective-Function-Free} {Adaptive} {Regularization} {Method} with {Optimal} {Complexity}},
     journal = {Open Journal of Mathematical Optimization},
     eid = {5},
     pages = {1--24},
     publisher = {Universit\'e de Montpellier},
     volume = {6},
     year = {2025},
     doi = {10.5802/ojmo.41},
     language = {en},
     url = {https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.41/}
}

TY  - JOUR
AU  - Serge Gratton
AU  - Sadok Jerad
AU  - Philippe L. Toint
TI  - A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity
JO  - Open Journal of Mathematical Optimization
PY  - 2025
SP  - 1
EP  - 24
VL  - 6
PB  - Université de Montpellier
UR  - https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.41/
DO  - 10.5802/ojmo.41
LA  - en
ID  - OJMO_2025__6__A5_0
ER  -

%0 Journal Article
%A Serge Gratton
%A Sadok Jerad
%A Philippe L. Toint
%T A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity
%J Open Journal of Mathematical Optimization
%D 2025
%P 1-24
%V 6
%I Université de Montpellier
%U https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.41/
%R 10.5802/ojmo.41
%G en
%F OJMO_2025__6__A5_0

Serge Gratton; Sadok Jerad; Philippe L. Toint. A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity. Open Journal of Mathematical Optimization, Volume 6 (2025), article  no. 5, 24 p. doi : 10.5802/ojmo.41. https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.41/

References
Cited by

[1] Artem Agafonov; Dmitry Kamzolov; Pavel Dvurechensky; Alexander Gasnikov; Martin Takac Inexact tensor methods and their application to stochastic convex optimization, Optim. Methods Softw., Volume 39 (2024) no. 1, pp. 42-83 | MR | Zbl

[2] Stefania Bellavia; Serge Gratton; Elisa Riccietti A Levenberg–Marquardt method for large nonlinear least-squares problems with dynamic accuracy in functions and gradients, Numer. Math., Volume 140 (2018) no. 3, pp. 791-825 | DOI | MR | Zbl

[3] Stefania Bellavia; Gianmarco Gurioli Stochastic analysis of an adaptive cubic regularization method under inexact gradient evaluations and dynamic Hessian accuracy, Optimization, Volume 71 (2021) no. 1, pp. 227-261 | DOI | MR | Zbl

[4] Stefania Bellavia; Gianmarco Gurioli; Benedetta Morini Adaptive cubic regularization methods with dynamic inexact Hessian information and applications to finite-sum minimization, IMA J. Numer. Anal., Volume 41 (2020) no. 1, pp. 764-799 | DOI | MR | Zbl

[5] Stefania Bellavia; Gianmarco Gurioli; Benedetta Morini; Philippe L. Toint Adaptive Regularization Algorithms with Inexact Evaluations for Nonconvex Optimization, SIAM J. Optim., Volume 29 (2019) no. 4, pp. 2881-2915 | DOI | MR | Zbl

[6] Stefania Bellavia; Gianmarco Gurioli; Benedetta Morini; Philippe L. Toint A stochastic ARC method with inexact function and random derivatives evaluations, Thirty-seventh International Conference on Machine Learning: ICML2020 (2020)

[7] Stefania Bellavia; Gianmarco Gurioli; Benedetta Morini; Philippe L. Toint Adaptive regularization for nonconvex optimization using inexact function values and randomly perturbed derivatives, J. Complexity, Volume 68 (2022), 101591, 25 pages | MR

[8] Albert S. Berahas; Liyuan Cao; Katya Scheinberg Global convergence rate analysis of a generic line search algorithm with noise, SIAM J. Optim., Volume 31 (2021) no. 2, pp. 1489-1518 | DOI | MR

[9] Ernesto G. Birgin; John L. Gardenghi; José M. Martínez; Sandra A. Santos; Philippe L. Toint Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models, Math. Program., Volume 163 (2016) no. 1-2, pp. 359-368 | DOI | MR | Zbl

[10] Jose Blanchet; Coralia Cartis; Matt Menickelly; Katya Scheinberg Convergence Rate Analysis of a Stochastic Trust-Region Method via Supermartingales, INFORMS J. Optim., Volume 1 (2019) no. 2, pp. 92-119 | DOI | MR

[11] Raghu Bollapragada; Richard H Byrd; Jorge Nocedal Exact and inexact subsampled Newton methods for optimization, IMA J. Numer. Anal., Volume 39 (2018) no. 2, pp. 545-578 | DOI | MR | Zbl

[12] Yair Carmon; John C. Duchi; Oliver Hinder; Aaron Sidford Lower bounds for finding stationary points I, Math. Program., Volume 184 (2019) no. 1-2, pp. 71-120 | DOI | MR | Zbl

[13] Coralia Cartis; Nicholas I. M. Gould; Philippe L. Toint Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity, Math. Program., Volume 130 (2011) no. 2, pp. 295-319 | DOI | MR | Zbl

[14] Coralia Cartis; Nicholas I. M. Gould; Philippe L. Toint Evaluation complexity of algorithms for nonconvex optimization, Society for Industrial and Applied Mathematics, 2022 | MR

[15] Coralia Cartis; Katya Scheinberg Global convergence rate analysis of unconstrained optimization methods based on probabilistic models, Math. Program., Volume 169 (2017) no. 2, pp. 337-375 | DOI | MR | Zbl

[16] Chih-Chung Chang; Chih-Jen Lin LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol., Volume 2 (2011) no. 3, pp. 1-27 | DOI

[17] I. Chatzigeorgiou Bounds on the Lambert function and Their Application to the Outage Analysis of User Cooperation, IEEE Commun. Lett., Volume 17 (2013) no. 8, pp. 1505-1508 | DOI

[18] El Mahdi Chayti; Nikita Doikov; Martin Jaggi Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods (2023) | arXiv

[19] Ruobing Chen; Matt Menickelly; Katya Scheinberg Stochastic optimization using a trust-region method and random models, Math. Program., Volume 169 (2017) no. 2, pp. 447-487 | DOI | MR | Zbl

[20] Xi Chen; Bo Jiang; Tianyi Lin; Shuzhong Zhang Accelerating Adaptive Cubic Regularization of Newton’s Method via Random Sampling, J. Mach. Learn. Res., Volume 23 (2022), 90, 38 pages | MR

[21] Robert M. Corless; Gaston H. Gonnet; David E. G. Hare; David J. Jeffrey; Donald E. Knuth On the LambertW function, Adv. Comput. Math., Volume 5 (1996) no. 1, pp. 329-359 | DOI | MR | Zbl

[22] Damek Davis; Dmitriy Drusvyatskiy Stochastic Model-Based Minimization of Weakly Convex Functions, SIAM J. Optim., Volume 29 (2019) no. 1, pp. 207-239 | DOI | MR | Zbl

[23] Alexandre Défossez; Léon Bottou; Francis Bach; Nicolas Usunier A Simple Convergence Proof of Adam and Adagrad, Trans. Mach. Learn. Res. (2022)

[24] Elizabeth D. Dolan; Jorge J. Moré; Todd S. Munson Optimality measures for performance profiles, SIAM J. Optim., Volume 16 (2006) no. 3, pp. 891-909 | DOI | MR | Zbl

[25] Dmitriy Drusvyatskiy The proximal point method revisited, SIAG/OPT Views and News, Volume 26 (2018), pp. 1-8

[26] John C. Duchi; Elad Hazan; Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, J. Mach. Learn. Res., Volume 12 (2011) no. 61, pp. 2121-2159 | MR | Zbl

[27] Matthew Faw; Isidoros Tziotis; Constantine Caramanis; Aryan Mokhtari; Sanjay Shakkottai; Rachel Ward The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance, Proceedings of 35th Conference on Learning Theory, Volume 178 (2022), pp. 313-355

[28] Geovani N. Grapiglia; Gabriel F. D. Stella An adaptive trust-region method without function evaluations, Comput. Optim. Appl., Volume 82 (2022) no. 1, pp. 31-60 | DOI | MR | Zbl

[29] Serge Gratton; Sadok Jerad; Philippe L. Toint Convergence Properties of an Objective-Function-Free Optimization Regularization Algorithm, Including an $𝒪 (ϵ^{- 3 / 2})$ Complexity Bound, SIAM J. Optim., Volume 33 (2023) no. 3, pp. 1621-1646 | DOI | MR | Zbl

[30] Serge Gratton; Sadok Jerad; Philippe L. Toint Complexity of a class of first-order objective-function-free optimization algorithms, Optim. Methods Softw. (2024), pp. 1-31 | DOI

[31] Serge Gratton; Philippe L. Toint A note on solving nonlinear optimization problems in variable precision, Comput. Optim. Appl., Volume 76 (2020) no. 3, pp. 917-933 | DOI | MR | Zbl

[32] Serge Gratton; Philippe L. Toint Adaptive regularization minimization algorithms with nonsmooth norms, IMA J. Numer. Anal., Volume 43 (2023) no. 2, pp. 920-949 | DOI | MR | Zbl

[33] Ahmed Khaled; Peter Richtárik Better Theory for SGD in the Nonconvex World, Trans. Mach. Learn. Res. (2023)

[34] Diederik Kingma; Jimmy Ba Adam: A Method for Stochastic Optimization, International Conference on Learning Representations (ICLR) (2015)

[35] Jonas Moritz Kohler; Aurelien Lucchi Sub-sampled Cubic Regularization for Non-convex Optimization, Proceedings of the 34th International Conference on Machine Learning, PMLR (2017), pp. 1895-1904

[36] Xiaoyu Li; Francesco Orabona On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR (2019), pp. 983-992

[37] Courtney Paquette; Katya Scheinberg A Stochastic Line Search Method with Expected Complexity Analysis, SIAM J. Optim., Volume 30 (2020) no. 1, pp. 349-376 | DOI | MR | Zbl

[38] Sashank J. Reddi; Satyen Kale; Sanjiv Kumar On the Convergence of Adam and Beyond, 6th International Conference on Learning Representations, ICLR 2018 (2018)

[39] Farbod Roosta-Khorasani; Michael W. Mahoney Sub-sampled Newton methods, Math. Program., Volume 174 (2018) no. 1-2, pp. 293-326 | DOI | MR | Zbl

[40] Katya Scheinberg; Miaolan Xie Stochastic Adaptive Regularization Method with Cubics: A High Probability Complexity Bound, OPT2022: 14th Annual Workshop on Optimization for Machine Learning (2022)

[41] Mark Schmidt; Nicolas Le Roux Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition (2013) | arXiv

[42] Nilesh Tripuraneni; Mitchell Stern; Chi Jin; Jeffrey Regier; Michael I. Jordan Stochastic Cubic Regularization for Fast Nonconvex Optimization, Proceedings of the 32nd International Conference on Neural Information Processing Systems (2018), pp. 2904-2913

[43] Zhe Wang; Yi Zhou; Yingbin Liang; Guanghui Lan A note on inexact gradient and Hessian conditions for cubic regularized Newton’s method, Oper. Res. Lett., Volume 47 (2019) no. 2, pp. 146-149 | DOI | MR | Zbl

[44] Zhe Wang; Yi Zhou; Yingbin Liang; Guanghui Lan Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (2019), pp. 2731-2740

[45] Rachel Ward; Xiaoxia Wu; Léon Bottou AdaGrad Stepsizes: Sharp Convergence Over Nonconvex Landscapes, Proceedings of the 36th International Conference on Machine Learning, Volume 97 (2019), pp. 6677-6686

[46] Xiaoxia Wu; Rachel Ward; Léon Bottou WNGrad: Learn the Learning Rate in Gradient Descent (2018) | arXiv

[47] Zhewei Yao; Peng Xu; Fred Roosta; Michael W. Mahoney Inexact Nonconvex Newton-Type Methods, INFORMS J. Optim., Volume 3 (2021) no. 2, pp. 154-182 | DOI | MR

[48] Chee Keng Yap Fundamental Problems of Algorithmic Algebra, Oxford University Press, Inc., 1999 | MR

[49] Junyu Zhang; Lin Xiao; Shuzhong Zhang Adaptive Stochastic Variance Reduction for Subsampled Newton Method with Cubic Regularization, INFORMS J. Optim., Volume 4 (2022) no. 1, pp. 45-64 | DOI | MR

[50] Dongruo Zhou; Jinghui Chen; Yuan Cao; Ziyan Yang; Quanquan Gu On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization, Trans. Mach. Learn. Res. (2024) https://openreview.net/forum?id=gh0cxhbz3c

[51] Dongruo Zhou; Pan Xu; Quanquan Gu Stochastic Variance-Reduced Cubic Regularization Methods, J. Mach. Learn. Res., Volume 20 (2019), 134, 47 pages | MR | Zbl

Cited by Sources: