We present a new algorithm to solve min-max or min-min problems out of the convex world. We use rigidity assumptions, ubiquitous in learning, making our method – the backtrack Hölder algorithm applicable to many optimization problems. Our approach takes advantage of hidden regularity properties and allows us, in particular, to devise a simple algorithm of ridge type. An original feature of our method is to come with automatic step size adaptation which departs from the usual overly cautious backtracking methods. In a general framework, we provide convergence theoretical guarantees and rates. We apply our findings on simple Generative Adversarial Network (GAN) problems obtaining promising numerical results. It is worthwhile mentioning that a byproduct of our approach is a simple recipe for general Hölderian backtracking optimization.
Revised:
Accepted:
Published online:
@article{OJMO_2023__4__A8_0, author = {J\'er\^ome Bolte and Lilian Glaudin and Edouard Pauwels and Mathieu Serrurier}, title = {The backtrack {H\"older} gradient method with application to min-max and min-min problems}, journal = {Open Journal of Mathematical Optimization}, eid = {8}, pages = {1--17}, publisher = {Universit\'e de Montpellier}, volume = {4}, year = {2023}, doi = {10.5802/ojmo.24}, language = {en}, url = {https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.24/} }
TY - JOUR AU - Jérôme Bolte AU - Lilian Glaudin AU - Edouard Pauwels AU - Mathieu Serrurier TI - The backtrack Hölder gradient method with application to min-max and min-min problems JO - Open Journal of Mathematical Optimization PY - 2023 SP - 1 EP - 17 VL - 4 PB - Université de Montpellier UR - https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.24/ DO - 10.5802/ojmo.24 LA - en ID - OJMO_2023__4__A8_0 ER -
%0 Journal Article %A Jérôme Bolte %A Lilian Glaudin %A Edouard Pauwels %A Mathieu Serrurier %T The backtrack Hölder gradient method with application to min-max and min-min problems %J Open Journal of Mathematical Optimization %D 2023 %P 1-17 %V 4 %I Université de Montpellier %U https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.24/ %R 10.5802/ojmo.24 %G en %F OJMO_2023__4__A8_0
Jérôme Bolte; Lilian Glaudin; Edouard Pauwels; Mathieu Serrurier. The backtrack Hölder gradient method with application to min-max and min-min problems. Open Journal of Mathematical Optimization, Volume 4 (2023), article no. 8, 17 p. doi : 10.5802/ojmo.24. https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.24/
[1] Convergence of the iterates of descent methods for analytic cost functions, SIAM J. Optim., Volume 16 (2005) no. 2, pp. 531-547 | DOI | MR | Zbl
[2] Sorting Out Lipschitz Function Approximation, Proceedings of the 36th International Conference on Machine Learning (Kamalika Chaudhuri; Ruslan Salakhutdinov, eds.) (Proceedings of Machine Learning Research), Volume 97, PMLR (2019), pp. 291-301
[3] Wasserstein Generative Adversarial Networks, Proceedings of the 34th International Conference on Machine Learning (Doina Precup; Yee Whye Teh, eds.) (Proceedings of Machine Learning Research), Volume 70, PMLR (2017), pp. 214-223
[4] Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality, Math. Oper. Res., Volume 35 (2010) no. 2, pp. 438-457 | DOI | Zbl
[5] Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods, Math. Program., Volume 137 (2013) no. 1-2, pp. 91-129 | DOI | MR | Zbl
[6] On the Quality of First-Order Approximation of Functions with Hölder Continuous Gradient, J. Optim. Theory Appl., Volume 185 (2020) no. 1, pp. 17-33 | DOI | Zbl
[7] Constrained optimization and Lagrange multiplier methods, Academic Press Inc., 2014
[8] Géométrie algébrique réelle, 12, Springer, 1987
[9] Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems, Math. Program., Volume 146 (2014) no. 1-2, pp. 459-494 | DOI | MR | Zbl
[10] Nonconvex Lagrangian-based optimization: monitoring schemes and global convergence, Math. Oper. Res., Volume 43 (2018) no. 4, pp. 1210-1232 | DOI | MR | Zbl
[11] Convex Optimization, Cambridge University Press, 2004 | DOI
[12] An Inertial Newton Algorithm for Deep Learning (2019) | arXiv
[13] Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, 2017
[14] An Introduction to Semialgebraic Geometry, RAAG Notes (1999)
[15] Sinkhorn distances: Lightspeed computation of optimal transport, Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, Curran Associates Inc. (2013), pp. 2292-2300
[16] GAN and VAE from an Optimal Transport Point of View (2017) | arXiv
[17] Learning Generative Models with Sinkhorn Divergences, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (Amos Storkey; Fernando Perez-Cruz, eds.) (Proceedings of Machine Learning Research), Volume 84, PMLR (2018), pp. 1608-1617
[18] A Variational Inequality Perspective on Generative Adversarial Networks, International Conference on Learning Representations (2019) https://openreview.net/forum?id=r1laena5ym
[19] Generative Adversarial Nets, Proceedings of the 27th International Conference on Neural Information Processing Systems (Z. Ghahramani; M. Welling; C. Cortes; N. D. Lawrence; K. Q. Weinberger, eds.), Curran Associates, Inc., 2014, pp. 2672-2680
[20] Tensor Methods for Minimizing Functions with Hölder Continuous Higher-Order Derivatives (2019) | arXiv
[21] Improved Training of Wasserstein GANs, Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc. (2017), pp. 5769-5779
[22] On the convergence of single-call stochastic extra-gradient methods, Proceedings of the 32th International Conference on Neural Information Processing Systems (2019), pp. 6936-6946
[23] An extragradient method for finding saddle points and for other problems, Ehkon. Mat. Metody, Volume 12 (1976) no. 4, pp. 747-756 | MR | Zbl
[24] On gradients of functions definable in o-minimal structures, Ann. Inst. Fourier, Volume 48 (1998) no. 3, pp. 769-783 | DOI | Numdam | MR | Zbl
[25] Mathematical foundations of game theory, Springer, 2019 | DOI
[26] On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems (2020) | arXiv
[27] Optimistic mirror descent in saddle-point problems: Going the extra(-gradient) mile, International Conference on Learning Representations (2019) https://openreview.net/forum?id=bkg8jjc9kq
[28] Prox-Method with Rate of Convergence ) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems, SIAM J. Optim., Volume 15 (2004) no. 1, pp. 229-251 | DOI | MR | Zbl
[29] Universal Gradient Methods for Convex Optimization Problems, Math. Program., Volume 152 (2015) no. 1-2, pp. 381-404 | DOI | MR | Zbl
[30] Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods, Proceedings of the 32th International Conference on Neural Information Processing Systems (H. Wallach; H. Larochelle; A. Beygelzimer; F. dAlché-Buc; E. Fox; R. Garnett, eds.), Curran Associates, Inc., 2019, pp. 14905-14916
[31] Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods, Proceedings of the 32th International Conference on Neural Information Processing Systems (H. Wallach; H. Larochelle; A. Beygelzimer; F. d’Alché-Buc; E. Fox; R. Garnett, eds.), Curran Associates, Inc., 2019, pp. 14934-14942
[32] Proximal subgradients, marginal values, and augmented Lagrangians in nonconvex optimization, Math. Oper. Res., Volume 6 (1981) no. 3, pp. 424-436 | DOI | MR | Zbl
[33] Variational Analysis, Grundlehren der Mathematischen Wissenschaften, Springer, 1998 no. 317 | DOI
[34] Lagrangian Methods for Composite Optimization, Processing, analyzing and learning of images, shapes, and forms. Part 2 (Handbook of Numerical Analysis), Volume 20, Elsevier, 2019, pp. 401-436 | DOI | MR | Zbl
[35] A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices, Ann. Math. Stat., Volume 35 (1964) no. 2, pp. 876-879 | DOI | MR | Zbl
[36] Zur theorie der gesellschaftsspiele, Math. Ann., Volume 100 (1928) no. 1, pp. 295-320 | DOI | MR | Zbl
[37] On Rings of Operators. Reduction Theory, Ann. Math., Volume 50 (1949) no. 2, pp. 401-485 | DOI | MR | Zbl
[38] Towards A Unified Min-Max Framework for Adversarial Exploration and Robustness (2019) | arXiv
[39] On the Global Convergence Rate of the Gradient Descent Method for Functions with Hölder Continuous Gradients, Optim. Lett., Volume 10 (2016) no. 6, pp. 1361-1370 | DOI | MR | Zbl
Cited by Sources: