A fully stochastic pth-order adaptive-regularization method for unconstrained nonconvex optimization is presented which never computes the objective-function value, but yet achieves the optimal $\mathcal{O}(\epsilon ^{-(p+1)/p})$ complexity bound for finding first-order critical points. When stochastic gradients and Hessians are considered, we recover the optimal $\mathcal{O}\left(\epsilon ^{-3/2}\right)$ bound for finding first-order critical points. The method is noise-tolerant and the inexactness conditions required for convergence depend on the history of past steps. Applications to cases where derivative evaluation is inexact and to minimization of finite sums by sampling are discussed. Numerical experiments on large binary classification problems illustrate the potential of the new method.
Revised:
Accepted:
Published online:
Serge Gratton 1; Sadok Jerad 1; Philippe L. Toint 2

@article{OJMO_2025__6__A5_0, author = {Serge Gratton and Sadok Jerad and Philippe L. Toint}, title = {A {Stochastic} {Objective-Function-Free} {Adaptive} {Regularization} {Method} with {Optimal} {Complexity}}, journal = {Open Journal of Mathematical Optimization}, eid = {5}, pages = {1--24}, publisher = {Universit\'e de Montpellier}, volume = {6}, year = {2025}, doi = {10.5802/ojmo.41}, language = {en}, url = {https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.41/} }
TY - JOUR AU - Serge Gratton AU - Sadok Jerad AU - Philippe L. Toint TI - A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity JO - Open Journal of Mathematical Optimization PY - 2025 SP - 1 EP - 24 VL - 6 PB - Université de Montpellier UR - https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.41/ DO - 10.5802/ojmo.41 LA - en ID - OJMO_2025__6__A5_0 ER -
%0 Journal Article %A Serge Gratton %A Sadok Jerad %A Philippe L. Toint %T A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity %J Open Journal of Mathematical Optimization %D 2025 %P 1-24 %V 6 %I Université de Montpellier %U https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.41/ %R 10.5802/ojmo.41 %G en %F OJMO_2025__6__A5_0
Serge Gratton; Sadok Jerad; Philippe L. Toint. A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity. Open Journal of Mathematical Optimization, Volume 6 (2025), article no. 5, 24 p. doi : 10.5802/ojmo.41. https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.41/
[1] Inexact tensor methods and their application to stochastic convex optimization, Optim. Methods Softw., Volume 39 (2024) no. 1, pp. 42-83 | MR | Zbl
[2] A Levenberg–Marquardt method for large nonlinear least-squares problems with dynamic accuracy in functions and gradients, Numer. Math., Volume 140 (2018) no. 3, pp. 791-825 | DOI | MR | Zbl
[3] Stochastic analysis of an adaptive cubic regularization method under inexact gradient evaluations and dynamic Hessian accuracy, Optimization, Volume 71 (2021) no. 1, pp. 227-261 | DOI | MR | Zbl
[4] Adaptive cubic regularization methods with dynamic inexact Hessian information and applications to finite-sum minimization, IMA J. Numer. Anal., Volume 41 (2020) no. 1, pp. 764-799 | DOI | MR | Zbl
[5] Adaptive Regularization Algorithms with Inexact Evaluations for Nonconvex Optimization, SIAM J. Optim., Volume 29 (2019) no. 4, pp. 2881-2915 | DOI | MR | Zbl
[6] A stochastic ARC method with inexact function and random derivatives evaluations, Thirty-seventh International Conference on Machine Learning: ICML2020 (2020)
[7] Adaptive regularization for nonconvex optimization using inexact function values and randomly perturbed derivatives, J. Complexity, Volume 68 (2022), 101591, 25 pages | MR
[8] Global convergence rate analysis of a generic line search algorithm with noise, SIAM J. Optim., Volume 31 (2021) no. 2, pp. 1489-1518 | DOI | MR
[9] Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models, Math. Program., Volume 163 (2016) no. 1-2, pp. 359-368 | DOI | MR | Zbl
[10] Convergence Rate Analysis of a Stochastic Trust-Region Method via Supermartingales, INFORMS J. Optim., Volume 1 (2019) no. 2, pp. 92-119 | DOI | MR
[11] Exact and inexact subsampled Newton methods for optimization, IMA J. Numer. Anal., Volume 39 (2018) no. 2, pp. 545-578 | DOI | MR | Zbl
[12] Lower bounds for finding stationary points I, Math. Program., Volume 184 (2019) no. 1-2, pp. 71-120 | DOI | MR | Zbl
[13] Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity, Math. Program., Volume 130 (2011) no. 2, pp. 295-319 | DOI | MR | Zbl
[14] Evaluation complexity of algorithms for nonconvex optimization, Society for Industrial and Applied Mathematics, 2022 | MR
[15] Global convergence rate analysis of unconstrained optimization methods based on probabilistic models, Math. Program., Volume 169 (2017) no. 2, pp. 337-375 | DOI | MR | Zbl
[16] LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol., Volume 2 (2011) no. 3, pp. 1-27 | DOI
[17] Bounds on the Lambert function and Their Application to the Outage Analysis of User Cooperation, IEEE Commun. Lett., Volume 17 (2013) no. 8, pp. 1505-1508 | DOI
[18] Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods (2023) | arXiv
[19] Stochastic optimization using a trust-region method and random models, Math. Program., Volume 169 (2017) no. 2, pp. 447-487 | DOI | MR | Zbl
[20] Accelerating Adaptive Cubic Regularization of Newton’s Method via Random Sampling, J. Mach. Learn. Res., Volume 23 (2022), 90, 38 pages | MR
[21] On the LambertW function, Adv. Comput. Math., Volume 5 (1996) no. 1, pp. 329-359 | DOI | MR | Zbl
[22] Stochastic Model-Based Minimization of Weakly Convex Functions, SIAM J. Optim., Volume 29 (2019) no. 1, pp. 207-239 | DOI | MR | Zbl
[23] A Simple Convergence Proof of Adam and Adagrad, Trans. Mach. Learn. Res. (2022)
[24] Optimality measures for performance profiles, SIAM J. Optim., Volume 16 (2006) no. 3, pp. 891-909 | DOI | MR | Zbl
[25] The proximal point method revisited, SIAG/OPT Views and News, Volume 26 (2018), pp. 1-8
[26] Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, J. Mach. Learn. Res., Volume 12 (2011) no. 61, pp. 2121-2159 | MR | Zbl
[27] The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance, Proceedings of 35th Conference on Learning Theory, Volume 178 (2022), pp. 313-355
[28] An adaptive trust-region method without function evaluations, Comput. Optim. Appl., Volume 82 (2022) no. 1, pp. 31-60 | DOI | MR | Zbl
[29] Convergence Properties of an Objective-Function-Free Optimization Regularization Algorithm, Including an Complexity Bound, SIAM J. Optim., Volume 33 (2023) no. 3, pp. 1621-1646 | DOI | MR | Zbl
[30] Complexity of a class of first-order objective-function-free optimization algorithms, Optim. Methods Softw. (2024), pp. 1-31 | DOI
[31] A note on solving nonlinear optimization problems in variable precision, Comput. Optim. Appl., Volume 76 (2020) no. 3, pp. 917-933 | DOI | MR | Zbl
[32] Adaptive regularization minimization algorithms with nonsmooth norms, IMA J. Numer. Anal., Volume 43 (2023) no. 2, pp. 920-949 | DOI | MR | Zbl
[33] Better Theory for SGD in the Nonconvex World, Trans. Mach. Learn. Res. (2023)
[34] Adam: A Method for Stochastic Optimization, International Conference on Learning Representations (ICLR) (2015)
[35] Sub-sampled Cubic Regularization for Non-convex Optimization, Proceedings of the 34th International Conference on Machine Learning, PMLR (2017), pp. 1895-1904
[36] On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR (2019), pp. 983-992
[37] A Stochastic Line Search Method with Expected Complexity Analysis, SIAM J. Optim., Volume 30 (2020) no. 1, pp. 349-376 | DOI | MR | Zbl
[38] On the Convergence of Adam and Beyond, 6th International Conference on Learning Representations, ICLR 2018 (2018)
[39] Sub-sampled Newton methods, Math. Program., Volume 174 (2018) no. 1-2, pp. 293-326 | DOI | MR | Zbl
[40] Stochastic Adaptive Regularization Method with Cubics: A High Probability Complexity Bound, OPT2022: 14th Annual Workshop on Optimization for Machine Learning (2022)
[41] Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition (2013) | arXiv
[42] Stochastic Cubic Regularization for Fast Nonconvex Optimization, Proceedings of the 32nd International Conference on Neural Information Processing Systems (2018), pp. 2904-2913
[43] A note on inexact gradient and Hessian conditions for cubic regularized Newton’s method, Oper. Res. Lett., Volume 47 (2019) no. 2, pp. 146-149 | DOI | MR | Zbl
[44] Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (2019), pp. 2731-2740
[45] AdaGrad Stepsizes: Sharp Convergence Over Nonconvex Landscapes, Proceedings of the 36th International Conference on Machine Learning, Volume 97 (2019), pp. 6677-6686
[46] WNGrad: Learn the Learning Rate in Gradient Descent (2018) | arXiv
[47] Inexact Nonconvex Newton-Type Methods, INFORMS J. Optim., Volume 3 (2021) no. 2, pp. 154-182 | DOI | MR
[48] Fundamental Problems of Algorithmic Algebra, Oxford University Press, Inc., 1999 | MR
[49] Adaptive Stochastic Variance Reduction for Subsampled Newton Method with Cubic Regularization, INFORMS J. Optim., Volume 4 (2022) no. 1, pp. 45-64 | DOI | MR
[50] On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization, Trans. Mach. Learn. Res. (2024) https://openreview.net/forum?id=gh0cxhbz3c
[51] Stochastic Variance-Reduced Cubic Regularization Methods, J. Mach. Learn. Res., Volume 20 (2019), 134, 47 pages | MR | Zbl
Cited by Sources: