The backtrack Hölder gradient method with application to min-max and min-min problems
Open Journal of Mathematical Optimization, Volume 4 (2023), article no. 8, 17 p.

We present a new algorithm to solve min-max or min-min problems out of the convex world. We use rigidity assumptions, ubiquitous in learning, making our method – the backtrack Hölder algorithm applicable to many optimization problems. Our approach takes advantage of hidden regularity properties and allows us, in particular, to devise a simple algorithm of ridge type. An original feature of our method is to come with automatic step size adaptation which departs from the usual overly cautious backtracking methods. In a general framework, we provide convergence theoretical guarantees and rates. We apply our findings on simple Generative Adversarial Network (GAN) problems obtaining promising numerical results. It is worthwhile mentioning that a byproduct of our approach is a simple recipe for general Hölderian backtracking optimization.

Received:
Revised:
Accepted:
Published online:
DOI: 10.5802/ojmo.24
Keywords: Hölder gradient, backtracking line search, min-max optimization, ridge method, semi-algebraic optimization
Jérôme Bolte 1; Lilian Glaudin 2; Edouard Pauwels 3; Mathieu Serrurier 4

1 Toulouse School of Economics, Université Toulouse Capitole, Toulouse, France
2 ANITI, University of Toulouse, Toulouse France
3 Toulouse School of Economics, Institut Universitaire de France, Toulouse, France.
4 Université Paul-Sabatier, IRIT Toulouse, France
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{OJMO_2023__4__A8_0,
     author = {J\'er\^ome Bolte and Lilian Glaudin and Edouard Pauwels and Mathieu Serrurier},
     title = {The backtrack {H\"older} gradient method with application to min-max and min-min problems},
     journal = {Open Journal of Mathematical Optimization},
     eid = {8},
     pages = {1--17},
     publisher = {Universit\'e de Montpellier},
     volume = {4},
     year = {2023},
     doi = {10.5802/ojmo.24},
     language = {en},
     url = {https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.24/}
}
TY  - JOUR
AU  - Jérôme Bolte
AU  - Lilian Glaudin
AU  - Edouard Pauwels
AU  - Mathieu Serrurier
TI  - The backtrack Hölder gradient method with application to min-max and min-min problems
JO  - Open Journal of Mathematical Optimization
PY  - 2023
SP  - 1
EP  - 17
VL  - 4
PB  - Université de Montpellier
UR  - https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.24/
DO  - 10.5802/ojmo.24
LA  - en
ID  - OJMO_2023__4__A8_0
ER  - 
%0 Journal Article
%A Jérôme Bolte
%A Lilian Glaudin
%A Edouard Pauwels
%A Mathieu Serrurier
%T The backtrack Hölder gradient method with application to min-max and min-min problems
%J Open Journal of Mathematical Optimization
%D 2023
%P 1-17
%V 4
%I Université de Montpellier
%U https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.24/
%R 10.5802/ojmo.24
%G en
%F OJMO_2023__4__A8_0
Jérôme Bolte; Lilian Glaudin; Edouard Pauwels; Mathieu Serrurier. The backtrack Hölder gradient method with application to min-max and min-min problems. Open Journal of Mathematical Optimization, Volume 4 (2023), article  no. 8, 17 p. doi : 10.5802/ojmo.24. https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.24/

[1] Pierre-Antoine Absil; Robert Mahony; Benjamin Andrews Convergence of the iterates of descent methods for analytic cost functions, SIAM J. Optim., Volume 16 (2005) no. 2, pp. 531-547 | DOI | MR | Zbl

[2] Cem Anil; James Lucas; Roger Grosse Sorting Out Lipschitz Function Approximation, Proceedings of the 36th International Conference on Machine Learning (Kamalika Chaudhuri; Ruslan Salakhutdinov, eds.) (Proceedings of Machine Learning Research), Volume 97, PMLR (2019), pp. 291-301

[3] Martin Arjovsky; Soumith Chintala; Léon Bottou Wasserstein Generative Adversarial Networks, Proceedings of the 34th International Conference on Machine Learning (Doina Precup; Yee Whye Teh, eds.) (Proceedings of Machine Learning Research), Volume 70, PMLR (2017), pp. 214-223

[4] Hédy Attouch; Jérôme Bolte; Patrick Redont; Antoine Soubeyran Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality, Math. Oper. Res., Volume 35 (2010) no. 2, pp. 438-457 | DOI | Zbl

[5] Hedy Attouch; Jérôme Bolte; Benar Fux Svaiter Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods, Math. Program., Volume 137 (2013) no. 1-2, pp. 91-129 | DOI | MR | Zbl

[6] Guillaume O. Berger; Pierre-Antoine Absil; Raphaël M. Jungers; Yurii Nesterov On the Quality of First-Order Approximation of Functions with Hölder Continuous Gradient, J. Optim. Theory Appl., Volume 185 (2020) no. 1, pp. 17-33 | DOI | Zbl

[7] Dimitri P. Bertsekas Constrained optimization and Lagrange multiplier methods, Academic Press Inc., 2014

[8] Jacek Bochnak; Michel Coste; Marie-Françoise Roy Géométrie algébrique réelle, 12, Springer, 1987

[9] Jérôme Bolte; Shoham Sabach; Marc Teboulle Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems, Math. Program., Volume 146 (2014) no. 1-2, pp. 459-494 | DOI | MR | Zbl

[10] Jérôme Bolte; Shoham Sabach; Marc Teboulle Nonconvex Lagrangian-based optimization: monitoring schemes and global convergence, Math. Oper. Res., Volume 43 (2018) no. 4, pp. 1210-1232 | DOI | MR | Zbl

[11] Stephen P. Boyd; Lieven Vandenberghe Convex Optimization, Cambridge University Press, 2004 | DOI

[12] Camille Castera; Jérôme Bolte; Cédric Févotte; Edouard Pauwels An Inertial Newton Algorithm for Deep Learning (2019) | arXiv

[13] Patrick L. Combettes; Heinz H. Bauschke Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, 2017

[14] Michel Coste An Introduction to Semialgebraic Geometry, RAAG Notes (1999)

[15] Marco Cuturi Sinkhorn distances: Lightspeed computation of optimal transport, Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, Curran Associates Inc. (2013), pp. 2292-2300

[16] Aude Genevay; Gabriel Peyré; Marco Cuturi GAN and VAE from an Optimal Transport Point of View (2017) | arXiv

[17] Aude Genevay; Gabriel Peyré; Marco Cuturi Learning Generative Models with Sinkhorn Divergences, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (Amos Storkey; Fernando Perez-Cruz, eds.) (Proceedings of Machine Learning Research), Volume 84, PMLR (2018), pp. 1608-1617

[18] Gauthier Gidel; Hugo Berard; Gaëtan Vignoud; Pascal Vincent; Simon Lacoste-Julien A Variational Inequality Perspective on Generative Adversarial Networks, International Conference on Learning Representations (2019) https://openreview.net/forum?id=r1laena5ym

[19] Ian Goodfellow; Jean Pouget-Abadie; Mehdi Mirza; Bing Xu; David Warde-Farley; Sherjil Ozair; Aaron Courville; Yoshua Bengio Generative Adversarial Nets, Proceedings of the 27th International Conference on Neural Information Processing Systems (Z. Ghahramani; M. Welling; C. Cortes; N. D. Lawrence; K. Q. Weinberger, eds.), Curran Associates, Inc., 2014, pp. 2672-2680

[20] Geovani N. Grapiglia; Yurii Nesterov Tensor Methods for Minimizing Functions with Hölder Continuous Higher-Order Derivatives (2019) | arXiv

[21] Ishaan Gulrajani; Faruk Ahmed; Martin Arjovsky; Vincent Dumoulin; Aaron Courville Improved Training of Wasserstein GANs, Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc. (2017), pp. 5769-5779

[22] Yu-Guan Hsieh; Franck Iutzeler; Jérôme Malick; Panayotis Mertikopoulos On the convergence of single-call stochastic extra-gradient methods, Proceedings of the 32th International Conference on Neural Information Processing Systems (2019), pp. 6936-6946

[23] G. M. Korpelevich An extragradient method for finding saddle points and for other problems, Ehkon. Mat. Metody, Volume 12 (1976) no. 4, pp. 747-756 | MR | Zbl

[24] Krzysztof Kurdyka On gradients of functions definable in o-minimal structures, Ann. Inst. Fourier, Volume 48 (1998) no. 3, pp. 769-783 | DOI | Numdam | MR | Zbl

[25] Rida Laraki; Jérôme Renault; Sylvain Sorin Mathematical foundations of game theory, Springer, 2019 | DOI

[26] Tianyi Lin; Chi Jin; Michael I. Jordan On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems (2020) | arXiv

[27] Panayotis Mertikopoulos; Bruno Lecouat; Houssam Zenati; Chuan-Sheng Foo; Vijay Chandrasekhar; Georgios Piliouras Optimistic mirror descent in saddle-point problems: Going the extra(-gradient) mile, International Conference on Learning Representations (2019) https://openreview.net/forum?id=bkg8jjc9kq

[28] Arkadi Nemirovski Prox-Method with Rate of Convergence O(1/t) ) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems, SIAM J. Optim., Volume 15 (2004) no. 1, pp. 229-251 | DOI | MR | Zbl

[29] Yu Nesterov Universal Gradient Methods for Convex Optimization Problems, Math. Program., Volume 152 (2015) no. 1-2, pp. 381-404 | DOI | MR | Zbl

[30] Maher Nouiehed; Maziar Sanjabi; Tianjian Huang; Jason D. Lee; Meisam Razaviyayn Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods, Proceedings of the 32th International Conference on Neural Information Processing Systems (H. Wallach; H. Larochelle; A. Beygelzimer; F. dAlché-Buc; E. Fox; R. Garnett, eds.), Curran Associates, Inc., 2019, pp. 14905-14916

[31] Maher Nouiehed; Maziar Sanjabi; Tianjian Huang; Jason D. Lee; Meisam Razaviyayn Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods, Proceedings of the 32th International Conference on Neural Information Processing Systems (H. Wallach; H. Larochelle; A. Beygelzimer; F. d’Alché-Buc; E. Fox; R. Garnett, eds.), Curran Associates, Inc., 2019, pp. 14934-14942

[32] R. Tyrrell Rockafellar Proximal subgradients, marginal values, and augmented Lagrangians in nonconvex optimization, Math. Oper. Res., Volume 6 (1981) no. 3, pp. 424-436 | DOI | MR | Zbl

[33] R. Tyrrell Rockafellar; Roger J.-B. Wets Variational Analysis, Grundlehren der Mathematischen Wissenschaften, Springer, 1998 no. 317 | DOI

[34] Shoham Sabach; Marc Teboulle Lagrangian Methods for Composite Optimization, Processing, analyzing and learning of images, shapes, and forms. Part 2 (Handbook of Numerical Analysis), Volume 20, Elsevier, 2019, pp. 401-436 | DOI | MR | Zbl

[35] Richard Sinkhorn A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices, Ann. Math. Stat., Volume 35 (1964) no. 2, pp. 876-879 | DOI | MR | Zbl

[36] John Von Neumann Zur theorie der gesellschaftsspiele, Math. Ann., Volume 100 (1928) no. 1, pp. 295-320 | DOI | MR | Zbl

[37] John Von Neumann On Rings of Operators. Reduction Theory, Ann. Math., Volume 50 (1949) no. 2, pp. 401-485 | DOI | MR | Zbl

[38] Jingkang Wang; Tianyun Zhang; Sijia Liu; Pin-Yu Chen; Jiacen Xu; Makan Fardad; Bo Li Towards A Unified Min-Max Framework for Adversarial Exploration and Robustness (2019) | arXiv

[39] Maryam Yashtini On the Global Convergence Rate of the Gradient Descent Method for Functions with Hölder Continuous Gradients, Optim. Lett., Volume 10 (2016) no. 6, pp. 1361-1370 | DOI | MR | Zbl

Cited by Sources: