Screening for a Reweighted Penalized Conditional Gradient Method
Open Journal of Mathematical Optimization, Volume 3 (2022), article no. 3, 35 p.

The conditional gradient method (CGM) is widely used in large-scale sparse convex optimization, having a low per iteration computational cost for structured sparse regularizers and a greedy approach for collecting nonzeros. We explore the sparsity acquiring properties of a general penalized CGM (P-CGM) for convex regularizers and a reweighted penalized CGM (RP-CGM) for nonconvex regularizers, replacing the usual convex constraints with gauge-inspired penalties. This generalization does not increase the per-iteration complexity noticeably. Without assuming bounded iterates or using line search, we show O(1/t) convergence of the gap of each subproblem, which measures distance to a stationary point. We couple this with a screening rule which is safe in the convex case, converging to the true support at a rate O(1/(δ 2 )) where δ0 measures how close the problem is to degeneracy. In the nonconvex case the screening rule converges to the true support in a finite number of iterations, but is not necessarily safe in the intermediate iterates. In our experiments, we verify the consistency of the method and adjust the aggressiveness of the screening rule by tuning the concavity of the regularizer.

Published online:
DOI: 10.5802/ojmo.14
Keywords: Dual screening, conditional gradient method, atomic sparsity, reweighted optimization
Yifan Sun 1; Francis Bach 2

1 Stonybrook University - Department of Computer Science, Stonybrook, New York, USA
2 INRIA - Département d’Informatique de l’Ecole Normale Supérieure PSL Research University Paris, France
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
     author = {Yifan Sun and Francis Bach},
     title = {Screening for a {Reweighted} {Penalized} {Conditional} {Gradient} {Method}},
     journal = {Open Journal of Mathematical Optimization},
     eid = {3},
     publisher = {Universit\'e de Montpellier},
     volume = {3},
     year = {2022},
     doi = {10.5802/ojmo.14},
     language = {en},
     url = {}
TI  - Screening for a Reweighted Penalized Conditional Gradient Method
JO  - Open Journal of Mathematical Optimization
PY  - 2022
DA  - 2022///
VL  - 3
PB  - Université de Montpellier
UR  -
UR  -
DO  - 10.5802/ojmo.14
LA  - en
ID  - OJMO_2022__3__A3_0
ER  - 
%0 Journal Article
%T Screening for a Reweighted Penalized Conditional Gradient Method
%J Open Journal of Mathematical Optimization
%D 2022
%V 3
%I Université de Montpellier
%R 10.5802/ojmo.14
%G en
%F OJMO_2022__3__A3_0
Yifan Sun; Francis Bach. Screening for a Reweighted Penalized Conditional Gradient Method. Open Journal of Mathematical Optimization, Volume 3 (2022), article  no. 3, 35 p. doi : 10.5802/ojmo.14.

[1] Francis Bach Structured sparsity-inducing norms through submodular functions, Advances in Neural Information Processing Systems (2010), pp. 118-126

[2] Francis Bach Duality between subgradient and conditional gradient methods, SIAM J. Optim., Volume 25 (2015) no. 1, pp. 115-129 | DOI | MR | Zbl

[3] Heinz H. Bauschke; Jérôme Bolte; Marc Teboulle A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications, Math. Oper. Res., Volume 42 (2017) no. 2, pp. 330-348 | DOI | MR | Zbl

[4] Heinz H. Bauschke; Patrick L. Combettes Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 408, Springer, 2011 | DOI

[5] Leonard Berrada; Andrew Zisserman; M. Pawan Kumar Deep Frank-Wolfe for neural network optimization (2018) (

[6] Howard D. Bondell; Brian J. Reich Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR, Biometrics, Volume 64 (2008) no. 1, pp. 115-123 | DOI | MR | Zbl

[7] Antoine Bonnefoy; Emiya Valentin; Ralaivola Liva; Remi Gribonval Dynamic screening: Accelerating first-order algorithms for the LASSO and group-LASSO, IEEE Trans. Signal Process., Volume 63 (2015) no. 19, pp. 5121-5132 | DOI | MR | Zbl

[8] Jonathan Borwein; Adrian S. Lewis Convex Analysis and Nonlinear Optimization: Theory and Examples, Springer, 2010

[9] Kristian Bredies; Dirk A. Lorenz Iterated hard shrinkage for minimization problems with sparsity constraints, SIAM J. Sci. Comput., Volume 30 (2008) no. 2, pp. 657-683 | DOI | MR | Zbl

[10] Kristian Bredies; Dirk A. Lorenz; Peter Maass A generalized conditional gradient method and its connection to an iterative shrinkage method, Comput. Optim. Appl., Volume 42 (2009) no. 2, pp. 173-193 | DOI | MR | Zbl

[11] James V. Burke; Jorge J. Moré On the identification of active constraints, SIAM J. Numer. Anal., Volume 25 (1988) no. 5, pp. 1197-1211 | DOI | MR | Zbl

[12] Jim Burke On the identification of active constraints II: The nonconvex case, SIAM J. Numer. Anal., Volume 27 (1990) no. 4, pp. 1081-1102 | DOI | MR | Zbl

[13] Emmanuel Candès; Justin Romberg Robust signal recovery from incomplete observations, 2006 International Conference on Image Processing (2006), pp. 1281-1284 | DOI | Zbl

[14] Emmanuel Candès; Terence Tao Decoding by linear programming, IEEE Trans. Inf. Theory, Volume 51 (2005) no. 12, pp. 4203-4215 | DOI | MR | Zbl

[15] Venkat Chandrasekaran; Benjamin Recht; Pablo A. Parrilo; Alan S. Willsky The convex geometry of linear inverse problems, Found. Comput. Math., Volume 12 (2012) no. 6, pp. 805-849 | DOI | MR | Zbl

[16] Visesh Chari; Simon Lacoste-Julien; Ivan Laptev; Josef Sivic On pairwise costs for network flow multi-object tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 5537-5545

[17] Xiaojun Chen; Weijun Zhou Convergence of reweighted 1 minimization algorithms and unique solution of truncated p minimization, Department of Applied Mathematics, The Hong Kong Polytechnic University, 2010

[18] Frank H. Clarke Generalized gradients and applications, Trans. Am. Math. Soc., Volume 205 (1975), pp. 247-262 | DOI | MR | Zbl

[19] Frank H. Clarke Nonsmooth analysis and optimization, Proceedings of the international congress of mathematicians, Volume 5 (1983), pp. 847-853 | Zbl

[20] Kenneth L. Clarkson Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm, ACM Trans. Algorithms, Volume 6 (2010) no. 4, p. 63 | MR | Zbl

[21] Ingrid Daubechies; Ronald DeVore; Massimo Fornasier; C Sinan Güntürk Iteratively reweighted least squares minimization for sparse recovery, Commun. Pure Appl. Math., Volume 63 (2010) no. 1, pp. 1-38 | DOI | MR | Zbl

[22] David L. Donoho Compressed sensing, IEEE Trans. Inf. Theory, Volume 52 (2006) no. 4, pp. 1289-1306 | DOI | MR | Zbl

[23] Miroslav Dudik; Zaid Harchaoui; Jérôme Malick Lifted coordinate descent for learning with trace-norm regularization, Artificial Intelligence and Statistics (2012), pp. 327-336

[24] Joseph C. Dunn; S. Harshbarger Conditional gradient algorithms with open loop step size rules, J. Math. Anal. Appl., Volume 62 (1978) no. 2, pp. 432-444 | DOI | MR | Zbl

[25] Laurent El Ghaoui; Vivian Viallon; Tarek Rabbani Safe feature elimination for the Lasso and sparse supervised learning problems, Pac. J. Optim., Volume 8 (2012) no. 4, pp. 667-698 | Zbl

[26] Alina Ene; Adrian Vladu Improved Convergence for 1 and Regression via Iteratively Reweighted Least Squares, International Conference on Machine Learning (2019), pp. 1794-1801

[27] Zhenan Fan; Halyun Jeong; Yifan Sun; Michael Friedlander et al. Atomic Decomposition via Polar Alignment: The Geometry of Structured Optimization, Found. Trends Optim., Volume 3 (2020) no. 4, pp. 280-366

[28] Olivier Fercoq; Alexandre Gramfort; Joseph Salmon Mind the duality gap: safer rules for the lasso, International Conference on Machine Learning (2015), pp. 333-342 | Zbl

[29] Marguerite Frank; Philip Wolfe An algorithm for quadratic programming, Nav. Res. Logist. Q., Volume 3 (1956) no. 1-2, pp. 95-110 | DOI | MR

[30] Robert M. Freund Dual gauge programs, with applications to quadratic programming and the minimum-norm problem, Math. Program., Volume 38 (1987) no. 1, pp. 47-67 | DOI | MR | Zbl

[31] Robert M. Freund; Paul Grigas; Rahul Mazumder An Extended Frank–Wolfe Method with “In-Face” Directions, and Its Application to Low-Rank Matrix Completion, SIAM J. Optim., Volume 27 (2017) no. 1, pp. 319-346 | DOI | MR | Zbl

[32] Michael Friedlander; Ives Macedo; Ting Kei Pong Gauge optimization and duality, SIAM J. Optim., Volume 24 (2014) no. 4, pp. 1999-2022 | DOI | MR | Zbl

[33] Pinghua Gong; Changshui Zhang; Zhaosong Lu; Jianhua Huang; Jieping Ye A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems, International Conference on Machine Learning (2013), pp. 37-45

[34] Jacques Guélat; Patrice Marcotte Some comments on Wolfe’s ‘away step’, Math. Program., Volume 35 (1986) no. 1, pp. 110-119 | DOI | MR | Zbl

[35] Isabelle Guyon; Steve Gunn; Asa Ben-Hur; Gideon Dror Result analysis of the NIPS 2003 feature selection challenge, Adv. Neural Inf. Process. Syst., Volume 17 (2004)

[36] Zaid Harchaoui; Anatoli Juditsky; Arkadi Nemirovski Conditional gradient algorithms for norm-regularized smooth convex optimization, Math. Program., Volume 152 (2015) no. 1-2, pp. 75-112 | DOI | MR | Zbl

[37] Warren Hare Identifying active manifolds in regularization problems, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Springer, 2011, pp. 261-271 | DOI | Zbl

[38] Elad Hazan Sparse approximate solutions to semidefinite programs, Latin American Symposium on Theoretical Informatics (2008), pp. 306-316 | Zbl

[39] Martin Jaggi Revisiting Frank-Wolfe: Projection-free sparse convex optimization., ICML (2013), pp. 427-435

[40] Tyler B. Johnson; Carlos Guestrin Stingy CD: safely avoiding wasteful updates in coordinate descent, Proceedings of the 34th International Conference on Machine Learning-Volume 70 (2017), pp. 1752-1760

[41] Rahul G. Krishnan; Simon Lacoste-Julien; David Sontag Barrier Frank-Wolfe for marginal inference, Advances in Neural Information Processing Systems (2015), pp. 532-540

[42] Simon Lacoste-Julien; Martin Jaggi On the global linear convergence of Frank-Wolfe optimization variants, Advances in Neural Information Processing Systems, 2015, pp. 496-504

[43] Simon Lacoste-Julien; Martin Jaggi; Mark Schmidt; Patrick Pletscher Block-coordinate Frank-Wolfe optimization for structural SVMs, International Conference on Machine Learning (2013), pp. 53-61

[44] Simon Lacoste-Julien; Fredrik Lindsten; Francis Bach Sequential kernel herding: Frank-Wolfe optimization for particle filtering, Artificial Intelligence and Statistics (2015), pp. 544-552

[45] Adrian S. Lewis; Stephen J. Wright Identifying activity, SIAM J. Optim., Volume 21 (2011) no. 2, pp. 597-614 | DOI | MR | Zbl

[46] Jun Liu; Zheng Zhao; Jie Wang; Jieping Ye Safe screening with variational inequalities and its application to lasso, International Conference on Machine Learning (2014), pp. 289-297

[47] Haihao Lu; Robert M. Freund; Yurii Nesterov Relatively smooth convex optimization by first-order methods, and applications, SIAM J. Optim., Volume 28 (2018) no. 1, pp. 333-354 | MR | Zbl

[48] Julien Mairal; Francis Bach; Jean Ponce Sparse modeling for image and vision processing (2014) ( | DOI

[49] Abed Malti; Cédric Herzet Safe screening tests for Lasso based on firmly non-expansiveness, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016), pp. 4732-4736 | DOI

[50] Cun Mu; Yuqian Zhang; John Wright; Donald Goldfarb Scalable robust matrix recovery: Frank–Wolfe meets proximal methods, SIAM J. Sci. Comput., Volume 38 (2016) no. 5, p. A3291-A3317 | MR | Zbl

[51] Eugene Ndiaye; Olivier Fercoq; Alexandre Gramfort; Joseph Salmon GAP safe screening rules for sparse multi-task and multi-class models, Advances in Neural Information Processing Systems (2015), pp. 811-819 | Zbl

[52] Yurii Nesterov Introductory Lectures on Convex Optimization: A Basic Course, Springer, 2013

[53] Julie Nutini; Mark Schmidt; Warren Hare “Active-set complexity” of proximal gradient: How long does it take to find the sparsity pattern?, Optim. Lett., Volume 13 (2019) no. 4, pp. 645-655 | DOI | MR | Zbl

[54] Julie Nutini; Mark Schmidt; Issam Laradji; Michael Friedlander; Hoyt Koepke Coordinate descent converges faster with the Gauss-Southwell rule than random selection, International Conference on Machine Learning (2015), pp. 1632-1641

[55] Guillaume Obozinski; Laurent Jacob; Jean-Philippe Vert Group lasso with overlaps: the latent group lasso approach (2011) (

[56] Peter Ochs; Alexey Dosovitskiy; Thomas Brox; Thomas Pock On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision, SIAM J. Imaging Sci., Volume 8 (2015) no. 1, pp. 331-372 | DOI | MR | Zbl

[57] Wei Ping; Qiang Liu; Alexander T. Ihler Learning infinite RBMs with Frank-Wolfe, Advances in Neural Information Processing Systems (2016), pp. 3063-3071

[58] Alain Rakotomamonjy; Gilles Gasso; Joseph Salmon Screening rules for lasso with non-convex sparse regularizers, International Conference on Machine Learning (2019), pp. 5341-5350

[59] Nikhil Rao; Parikshit Shah; Stephen J. Wright Forward-backward greedy algorithms for atomic norm regularization, IEEE Trans. Signal Process., Volume 63 (2015) no. 21, pp. 5798-5811 | MR | Zbl

[60] R. Tyrrell Rockafellar Convex Analysis, 28, Princeton University Press, 1970 | DOI

[61] Ozan Sener; Vladlen Koltun Multi-task learning as multi-objective optimization, Advances in Neural Information Processing Systems (2018), pp. 527-538

[62] Ambuj Tewari; Pradeep K. Ravikumar; Inderjit S. Dhillon Greedy algorithms for structurally constrained high dimensional problems, Advances in Neural Information Processing Systems (2011), pp. 882-890

[63] Robert Tibshirani Regression shrinkage and selection via the Lasso, J. R. Stat. Soc., Ser. B, Stat. Methodol., Volume 58 (1996) no. 1, pp. 267-288 | MR | Zbl

[64] Marina Vinyes; Guillaume Obozinski Fast column generation for atomic norm regularization, Proceedings of International Conference on Artificial Intelligence and Statistics (2017)

[65] Balder Von Hohenbalken Simplicial decomposition in nonlinear programming algorithms, Math. Program., Volume 13 (1977) no. 1, pp. 49-68 | DOI | MR | Zbl

[66] Jie Wang; Jiayu Zhou; Peter Wonka; Jieping Ye LASSO screening rules via dual polytope projection, Adv. Neural Inf. Process. Syst. (2013), pp. 1070-1078 | Zbl

[67] R. Wolke; H. Schwetlick Iteratively reweighted least squares: algorithms, convergence analysis, and numerical comparisons, SIAM J. Sci. Stat. Comput., Volume 9 (1988) no. 5, pp. 907-921 | DOI | MR | Zbl

[68] Stephen J. Wright; Robert D. Nowak; Mário A. T. Figueiredo Sparse reconstruction by separable approximation, IEEE Trans. Signal Process., Volume 57 (2009) no. 7, pp. 2479-2493 | DOI | MR

[69] Yaoliang Yu; Xinhua Zhang; Dale Schuurmans Generalized conditional gradient for sparse estimation, J. Mach. Learn. Theory, Volume 18 (2017) no. 1, pp. 5279-5324 | MR | Zbl

[70] Alp Yurtsever; Madeleine Udell; Joel Tropp; Volkan Cevher Sketchy decisions: Convex low-rank matrix optimization with optimal storage, Artificial intelligence and statistics (2017), pp. 1188-1196 | Zbl

[71] Xiangrong Zeng; Mário A. T. Figueiredo The Ordered Weighted 1 Norm: Atomic Formulation, Projections, and Algorithms (2014) (

[72] Song Zhou; Swati Gupta; Madeleine Udell Limited memory Kelley’s method converges for composite convex and submodular objectives, Advances in Neural Information Processing Systems, 2018, pp. 4414-4424

Cited by Sources: