Theoretical analysis of the randomized subspace regularized Newton method for non-convex optimization

Terunari Fuji; Pierre-Louis Poirion; Akiko Takeda

doi:10.5802/ojmo.46

Terunari Fuji ¹; Pierre-Louis Poirion ²; Akiko Takeda ³

¹ Graduate School of Information Science and Technology, University of Tokyo
² Center for Advanced Intelligence Project, RIKEN
³ Graduate School of Information Science and Technology, University of Tokyo; Center for Advanced Intelligence Project, RIKEN

Open Journal of Mathematical Optimization, Volume 6 (2025), article no. 8, 35 p.

Abstract

While there already exist randomized subspace Newton methods that restrict the search direction to a random subspace for a convex function, we propose a randomized subspace regularized Newton method for a non-convex function and more generally we investigate thoroughly, for the first time, the local convergence rate of the randomized subspace Newton method. In our proposed algorithm, we use a modified Hessian of the function restricted to some random subspace so that, with high probability, the function value decreases at each iteration, even when the objective function is non-convex. We show that our method has global convergence under appropriate assumptions and its convergence rate is the same as that of the full regularized Newton method. Furthermore, we obtain a local linear convergence rate under some additional assumptions, and prove that this rate is the best we can hope, in general, when using a random subspace. We furthermore prove that if the Hessian, at the local optimum, is rank deficient then super-linear convergence holds.

Received: 2024-06-11
Revised: 2025-04-23
Accepted: 2025-09-11
Published online: 2025-10-23

DOI: 10.5802/ojmo.46

Keywords: Random projection, Newton method, non-convex optimization, local convergence rate

Author's affiliations:

Terunari Fuji ¹; Pierre-Louis Poirion ²; Akiko Takeda ³

¹ Graduate School of Information Science and Technology, University of Tokyo
² Center for Advanced Intelligence Project, RIKEN
³ Graduate School of Information Science and Technology, University of Tokyo; Center for Advanced Intelligence Project, RIKEN

License:

CC-BY 4.0

Copyrights: The authors retain unrestricted copyrights and publishing rights

@article{OJMO_2025__6__A9_0,
     author = {Terunari Fuji and Pierre-Louis Poirion and Akiko Takeda},
     title = {Theoretical analysis of the randomized subspace regularized {Newton} method for non-convex optimization},
     journal = {Open Journal of Mathematical Optimization},
     eid = {8},
     pages = {1--35},
     year = {2025},
     publisher = {Universit\'e de Montpellier},
     volume = {6},
     doi = {10.5802/ojmo.46},
     language = {en},
     url = {https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.46/}
}

TY  - JOUR
AU  - Terunari Fuji
AU  - Pierre-Louis Poirion
AU  - Akiko Takeda
TI  - Theoretical analysis of the randomized subspace regularized Newton method for non-convex optimization
JO  - Open Journal of Mathematical Optimization
PY  - 2025
SP  - 1
EP  - 35
VL  - 6
PB  - Université de Montpellier
UR  - https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.46/
DO  - 10.5802/ojmo.46
LA  - en
ID  - OJMO_2025__6__A9_0
ER  -

%0 Journal Article
%A Terunari Fuji
%A Pierre-Louis Poirion
%A Akiko Takeda
%T Theoretical analysis of the randomized subspace regularized Newton method for non-convex optimization
%J Open Journal of Mathematical Optimization
%D 2025
%P 1-35
%V 6
%I Université de Montpellier
%U https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.46/
%R 10.5802/ojmo.46
%G en
%F OJMO_2025__6__A9_0

Terunari Fuji; Pierre-Louis Poirion; Akiko Takeda. Theoretical analysis of the randomized subspace regularized Newton method for non-convex optimization. Open Journal of Mathematical Optimization, Volume 6 (2025), article  no. 8, 35 p.. doi: 10.5802/ojmo.46

References
Cited by

[1] Roman Andreev A note on the norm of oblique projections, Appl. Math. E-Notes, Volume 14 (2014), pp. 43-44 | Zbl

[2] Jonathan T. Barron A general and adaptive robust loss function, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2019), pp. 4331-4339 | DOI

[3] James Bergstra; Yoshua Bengio Random search for hyper-parameter optimization, J. Mach. Learn. Res., Volume 13 (2012) no. 2, p. 381-305 | Zbl | MR

[4] Leon Bottou; Frank E. Curtis; Jorge Nocedal Optimization methods for large-scale machine learning, SIAM Rev., Volume 60 (2018) no. 2, pp. 223-311 | DOI | MR | Zbl

[5] Coralia Cartis; Estelle Massart; Adilet Otemissov Bound-constrained global optimization of functions with low effective dimensionality using multiple random embeddings, Math. Program., Volume 198 (2023) no. 1(A), pp. 997-1058 | Zbl | MR | DOI

[6] Coralia Cartis; Estelle Massart; Adilet Otemissov Global optimization using random embeddings, Math. Program., Volume 200 (2023), pp. 781-829 | DOI | MR | Zbl

[7] Coralia Cartis; Adilet Otemissov A dimensionality reduction technique for unconstrained global optimization of functions with low effective dimensionality, Inf. Inference, Volume 11 (2021) no. 1, pp. 167-201 | MR | Zbl

[8] Miantao Chao; Boris S. Mordukhovich; Zijian Shi; Jin Zhang Coderivative-Based Newton Methods with Wolfe Linesearch for Nonsmooth Optimization (2024) | arXiv | Zbl

[9] Long Chen; Xiaozhe Hu; Huiwen Wu Randomized fast subspace descent methods (2020) | arXiv | Zbl

[10] Paul G. Constantine; Eric Dow; Qiqi Wang Active subspace methods in theory and practice: applications to kriging surfaces, SIAM J. Sci. Comput., Volume 36 (2014) no. 4, p. A1500-A1524 | DOI | MR | Zbl

[11] Cartis Coralia; Fowkes Jaroslav; Shao Zhen A Randomised Subspace Gauss-Newton Method for Nonlinear Least-Squares, Thirty-seventh International Conference on Machine Learning, 2020. Workshop on Beyond First Order Methods in ML Systems, 2020

[12] Cartis Coralia; Fowkes Jaroslav; Shao Zhen Randomised subspace methods for non-convex optimization, with applications to nonlinear least-squares (2022) | arXiv | Zbl

[13] Li Deng The MNIST database of handwritten digit images for machine learning research, IEEE Signal Process. Mag., Volume 29 (2012) no. 6, pp. 141-142 | DOI

[14] Nikita Doikov; Konstantin Mishchenko; Yurii Nesterov Super-Universal Regularized Newton Method, SIAM J. Optim., Volume 34 (2024) no. 1, pp. 27-56 | DOI | MR | Zbl

[15] Dheeru Dua; Casey Graff UCI Machine Learning Repository, 2017 (University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml)

[16] Massimo Fornasier; Karin Schnass; Jan Vybiral Learning functions of few arbitrary linear parameters in high dimensions, Found. Comput. Math., Volume 12 (2012), pp. 229-262 | DOI | MR | Zbl

[17] Lukas P. Fröhlich; Edgar D. Klenske; Christian G. Daniel; Melanie N. Zeilinger Bayesian optimization for policy search in high-dimensional systems via automatic domain selection, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE (2019), pp. 757-764 | DOI

[18] Robert Gower; Dmitry Kovalev; Felix Lieder; Peter Richtárik RSN: Randomized Subspace Newton, Adv. Neural Inf. Process. Syst., Volume 32 (2019), pp. 616-625

[19] Serge Gratton; Sadok Jerad; Philippe L. Toint Yet another fast variant of Newton’s method for nonconvex optimization, IMA J. Numer. Anal., Volume 45 (2025) no. 2, pp. 971-1008 | DOI | MR

[20] Dmitry Grishchenko; Franck Iutzeler; Jérôme Malick Proximal gradient methods with adaptive subspace sampling, Math. Oper. Res., Volume 46 (2021) no. 4, pp. 1303-1323 | DOI | MR | Zbl

[21] Guy Gur-Ari; Daniel A. Roberts; Ethan Dyer Gradient descent happens in a tiny subspace (2018) | arXiv

[22] Filip Hanzely; Nikita Doikov; Peter Richtárik; Yurii Nesterov Stochastic subspace cubic Newton method, Proceedings of the 37th International Conference on Machine Learning (Hal Daumé III; Aarti Singh, eds.) (Proceedings of Machine Learning Research), Volume 119, PMLR (2020), pp. 4027-4038

[23] Frank Hutter; Holger Hoos; Kevin Leyton-Brown An efficient approach for assessing hyperparameter importance, Proceedings of the 31st International Conference on Machine Learning, PMLR (2014), pp. 754-762

[24] Rujun Jiang; Xudong Li Holderian Error Bounds and K.L. Inequality for the Trust Region Subproblem, Math. Oper. Res., Volume 47 (2022) no. 4, pp. 3025-3050 | DOI | Zbl | MR

[25] William Johnson; Joram Lindenstrauss Extensions of Lipschitz mappings into a Hilbert space, Conference in Modern Analysis and Probability (G. Hedlund, ed.) (Contemporary Mathematics), Volume 26, American Mathematical Society (1984), pp. 189-206 | DOI | Zbl

[26] Christopher G. Knight; Sylvia H. E. Knight; Neil Massey; Tolu Aina; Carl Christensen; Dave J. Frame; Jamie A. Kettleborough; Andrew Martin; Stephen Pascoe; Ben Sanderson; David A. Stainforth; Myles R. Allen Association of parameter, software, and hardware variation with large-scale behavior across 57,000 climate models, Proc. Natl. Acad. Sci. USA, Volume 104 (2007) no. 30, pp. 12259-12264 | DOI

[27] Dmitry Kovalev; Robert Gower; Peter Richtárik; Alexander Rogozin Fast linear convergence of randomized BFGS (2020) | arXiv | Zbl

[28] David Kozak; Stephen Becker; Alireza Doostan; Luis Tenorio A stochastic subspace approach to gradient-free optimization in high dimensions, Comput. Optim. Appl., Volume 79 (2021) no. 2, pp. 339-368 | DOI | Zbl | MR

[29] David Kozak; Stephen Becker; Alireza Doostan; Luis Tenorio A stochastic subspace approach to gradient-free optimization in high dimensions, Comput. Optim. Appl., Volume 79 (2021) no. 2, pp. 339-368 | DOI | MR | Zbl

[30] Jonathan Lacotte; Mert Pilanci; Marco Pavone High-Dimensional Optimization in Adaptive Random Subspaces, Curran Associates, Inc. (2019), pp. 10847-10857

[31] Boris S. Mordukhovich; Xiaoming Yuan; Shangzhi Zeng; Jin Zhang A globally convergent proximal Newton-type method in nonsmooth convex optimization, Math. Program., Volume 198 (2023) no. 1, pp. 899-936 | DOI | MR | Zbl

[32] Yurii Nesterov; Boris T. Polyak Cubic regularization of Newton method and its global performance, Math. Program., Volume 108 (2006), pp. 177-205 | DOI | MR | Zbl

[33] Vardan Papyan The full spectrum of deepnet hessians at scale: Dynamics with sgd training and sample size (2018) | arXiv

[34] Lindon Roberts; Clément W. Royer Direct search based on probabilistic descent in reduced spaces, SIAM J. Optim., Volume 33 (2023) no. 4, pp. 3057-3082 | DOI | MR | Zbl

[35] Mark Rudelson; Roman Vershynin Smallest singular value of a random rectangular matrix, Commun. Pure Appl. Math., Volume 62 (2009) no. 12, pp. 1707-1739 | DOI | MR | Zbl

[36] Levent Sagun; Utku Evci; V. Ugur Guney; Yann Dauphin; Leon Bottou Empirical analysis of the hessian of over-parametrized neural networks (2017) | arXiv

[37] Zhen Shao On random embeddings and their application to optimisation (2022) | arXiv

[38] Sebastian U. Stich; Christian L. Müller; Bernd Gärtner Optimization of Convex Functions with Random Pursuit, SIAM J. Optim., Volume 23 (2013) no. 2, pp. 1284-1309 | DOI | MR | Zbl

[39] Kenji Ueda; Nobuo Yamashita Convergence properties of the regularized Newton method for the unconstrained nonconvex optimization, Appl. Math. Optim., Volume 62 (2010) no. 1, pp. 27-46 | DOI | MR | Zbl

[40] Kenji Ueda; Nobuo Yamashita A regularized Newton method without line search for unconstrained optimization, Comput. Optim. Appl., Volume 59 (2014) no. 1-2, pp. 321-351 | DOI | MR | Zbl

[41] Roman Vershynin High-dimensional probability: An introduction with applications in data science, Cambridge Series in Statistical and Probabilistic Mathematics, 47, Cambridge University Press, 2018 | Zbl | MR

[42] Ziyu Wang; Frank Hutter; Masrour Zoghi; David Matheson; Nando De Feitas Bayesian optimization in a billion dimensions via random embeddings, J. Artif. Intell. Res., Volume 55 (2016), pp. 361-387 | DOI | MR | Zbl

[43] Stephen J. Wright Coordinate descent algorithms, Math. Program., Volume 151 (2015) no. 1, pp. 3-34 | DOI | MR | Zbl

[44] Peng Xu; Fred Roosta; Michael W. Mahoney Second-order optimization for non-convex machine learning: an empirical study, Proceedings of the 2020 SIAM International Conference on Data Mining (SDM), 2020, pp. 199-207

[45] Yuya Yamakawa; Nobuo Yamashita Convergence analysis of a regularized Newton method with generalized regularization terms for unconstrained convex optimization problems, Appl. Math. Comput., Volume 491 (2025), 129219 | DOI | MR | Zbl

[46] Zhewei Yao Efficient second-order methods for non-convex optimization and machine learning, Ph. D. Thesis, UC Berkeley (2021) (https://escholarship.org/uc/item/0431q1ws)

[47] Yuhao Zhou; Jintao Xu; Chenglong Bao; Chao Ding; Jun Zhu A Regularized Newton Method for Nonconvex Optimization with Global and Local Complexity Guarantees (2025) | arXiv

[48] Hong Zhu; Yunhai Xiao A hybrid inexact regularized Newton and negative curvature method, Comput. Optim. Appl., Volume 88 (2024) no. 3, pp. 849-870 | MR | Zbl

Cited by Sources: