Tight analyses for subgradient descent I: Lower bounds
Open Journal of Mathematical Optimization, Volume 5 (2024), article no. 4, 17 p.

Consider the problem of minimizing functions that are Lipschitz and convex, but not necessarily differentiable. We construct a function from this class for which the Tþ iterate of subgradient descent has error Ω(log(T)/T). This matches a known upper bound of O(log(T)/T). We prove analogous results for functions that are additionally strongly convex. There exists such a function for which the error of the Tþ iterate of subgradient descent has error Ω(log(T)/T), matching a known upper bound of O(log(T)/T). These results resolve a question posed by Shamir (2012).

Received:
Revised:
Accepted:
Published online:
DOI: 10.5802/ojmo.31
Mots-clés : Gradient descent, First-order methods, Lower bounds
Nicholas J. A. Harvey 1; Chris Liaw 2; Sikander Randhawa 1

1 University of British Columbia
2 Google Research
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{OJMO_2024__5__A4_0,
     author = {Nicholas J. A. Harvey and Chris Liaw and Sikander Randhawa},
     title = {Tight analyses for subgradient descent {I:} {Lower} bounds},
     journal = {Open Journal of Mathematical Optimization},
     eid = {4},
     pages = {1--17},
     publisher = {Universit\'e de Montpellier},
     volume = {5},
     year = {2024},
     doi = {10.5802/ojmo.31},
     language = {en},
     url = {https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.31/}
}
TY  - JOUR
AU  - Nicholas J. A. Harvey
AU  - Chris Liaw
AU  - Sikander Randhawa
TI  - Tight analyses for subgradient descent I: Lower bounds
JO  - Open Journal of Mathematical Optimization
PY  - 2024
SP  - 1
EP  - 17
VL  - 5
PB  - Université de Montpellier
UR  - https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.31/
DO  - 10.5802/ojmo.31
LA  - en
ID  - OJMO_2024__5__A4_0
ER  - 
%0 Journal Article
%A Nicholas J. A. Harvey
%A Chris Liaw
%A Sikander Randhawa
%T Tight analyses for subgradient descent I: Lower bounds
%J Open Journal of Mathematical Optimization
%D 2024
%P 1-17
%V 5
%I Université de Montpellier
%U https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.31/
%R 10.5802/ojmo.31
%G en
%F OJMO_2024__5__A4_0
Nicholas J. A. Harvey; Chris Liaw; Sikander Randhawa. Tight analyses for subgradient descent I: Lower bounds. Open Journal of Mathematical Optimization, Volume 5 (2024), article  no. 4, 17 p. doi : 10.5802/ojmo.31. https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.31/

[1] Viorel Barbu; Teodor Precupanu Convexity and optimization in Banach spaces, Springer, 2012 | DOI

[2] Heinz H. Bauschke Projection Algorithms and Monotone Operators, Ph. D. Thesis, Simon Fraser University (1996)

[3] Heinz H. Bauschke; Patrick L. Combettes Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, 2017 | DOI

[4] Elad Hazan Introduction to Online Convex Optimization, Found. Trends Optim., Volume 2 (2015) no. 3–4

[5] Elad Hazan; Amit Agarwal; Satyen Kale Logarithmic regret algorithms for online convex optimization, Mach. Learn., Volume 69 (2007) no. 2-3, pp. 169-192 | DOI | Zbl

[6] Elad Hazan; Satyen Kale Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization, J. Mach. Learn. Theory, Volume 15 (2014) no. 1, pp. 2489-2512 | Zbl

[7] Jean-Baptiste Hiriart-Urruty; Claude Lemaréchal Convex Analysis and Minimization Algorithms I, Springer, 1996

[8] Prateek Jain; Dheeraj Nagaraj; Praneeth Netrapalli Making the Last Iterate of SGD Information Theoretically Optimal, Conference on Learning Theory (2019), pp. 1752-1755

[9] Simon Lacoste-Julien; Mark W. Schmidt; Francis R. Bach A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method (2012) | arXiv

[10] Arkadi S. Nemirovsky; David B. Yudin Problem complexity and method efficiency in optimization, John Wiley & Sons, 1983

[11] Yurii Nesterov; Vladimir Shikhman Quasi-monotone Subgradient Methods for Nonsmooth Convex Minimization, J. Optim. Theory Appl., Volume 165 (2015) no. 3, pp. 917-940 | DOI | Zbl

[12] Boris T. Polyak; Anatoli B. Juditsky Acceleration of stochastic approximation by averaging, SIAM J. Control Optim., Volume 30 (1992) no. 4, pp. 838-855 | DOI | Zbl

[13] Alexander Rakhlin; Ohad Shamir; Karthik Sridharan Making gradient descent optimal for strongly convex stochastic optimization, Proceedings of ICML (2012)

[14] R. Tyrrell Rockafellar Convex Analysis, Princeton University Press, 1970 | DOI

[15] David Ruppert Efficient estimations from a slowly convergent Robbins-Monro process (1988) (Technical report)

[16] Ohad Shamir Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?, Proceedings of the 25th Annual Conference on Learning Theory (Proceedings of Machine Learning Research), Volume 23, 2012, p. 47.1-47.3

[17] Ohad Shamir; Tong Zhang Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes, Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research), Volume 28, 2013, pp. 71-79

Cited by Sources: