The conditional gradient method (CGM) is widely used in large-scale sparse convex optimization, having a low per iteration computational cost for structured sparse regularizers and a greedy approach for collecting nonzeros. We explore the sparsity acquiring properties of a general penalized CGM (P-CGM) for convex regularizers and a reweighted penalized CGM (RP-CGM) for nonconvex regularizers, replacing the usual convex constraints with gauge-inspired penalties. This generalization does not increase the per-iteration complexity noticeably. Without assuming bounded iterates or using line search, we show $O(1/t)$ convergence of the gap of each subproblem, which measures distance to a stationary point. We couple this with a screening rule which is safe in the convex case, converging to the true support at a rate $O(1/\left({\delta}^{2}\right))$ where $\delta \ge 0$ measures how close the problem is to degeneracy. In the nonconvex case the screening rule converges to the true support in a finite number of iterations, but is not necessarily safe in the intermediate iterates. In our experiments, we verify the consistency of the method and adjust the aggressiveness of the screening rule by tuning the concavity of the regularizer.

Revised:

Accepted:

Published online:

^{1}; Francis Bach

^{2}

@article{OJMO_2022__3__A3_0, author = {Yifan Sun and Francis Bach}, title = {Screening for a {Reweighted} {Penalized} {Conditional} {Gradient} {Method}}, journal = {Open Journal of Mathematical Optimization}, eid = {3}, publisher = {Universit\'e de Montpellier}, volume = {3}, year = {2022}, doi = {10.5802/ojmo.14}, language = {en}, url = {https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.14/} }

TY - JOUR TI - Screening for a Reweighted Penalized Conditional Gradient Method JO - Open Journal of Mathematical Optimization PY - 2022 DA - 2022/// VL - 3 PB - Université de Montpellier UR - https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.14/ UR - https://doi.org/10.5802/ojmo.14 DO - 10.5802/ojmo.14 LA - en ID - OJMO_2022__3__A3_0 ER -

Yifan Sun; Francis Bach. Screening for a Reweighted Penalized Conditional Gradient Method. Open Journal of Mathematical Optimization, Volume 3 (2022), article no. 3, 35 p. doi : 10.5802/ojmo.14. https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.14/

[1] Structured sparsity-inducing norms through submodular functions, Advances in Neural Information Processing Systems (2010), pp. 118-126

[2] Duality between subgradient and conditional gradient methods, SIAM J. Optim., Volume 25 (2015) no. 1, pp. 115-129 | DOI | MR | Zbl

[3] A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications, Math. Oper. Res., Volume 42 (2017) no. 2, pp. 330-348 | DOI | MR | Zbl

[4] Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 408, Springer, 2011 | DOI

[5] Deep Frank-Wolfe for neural network optimization (2018) (https://arxiv.org/abs/1811.07591)

[6] Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR, Biometrics, Volume 64 (2008) no. 1, pp. 115-123 | DOI | MR | Zbl

[7] Dynamic screening: Accelerating first-order algorithms for the LASSO and group-LASSO, IEEE Trans. Signal Process., Volume 63 (2015) no. 19, pp. 5121-5132 | DOI | MR | Zbl

[8] Convex Analysis and Nonlinear Optimization: Theory and Examples, Springer, 2010

[9] Iterated hard shrinkage for minimization problems with sparsity constraints, SIAM J. Sci. Comput., Volume 30 (2008) no. 2, pp. 657-683 | DOI | MR | Zbl

[10] A generalized conditional gradient method and its connection to an iterative shrinkage method, Comput. Optim. Appl., Volume 42 (2009) no. 2, pp. 173-193 | DOI | MR | Zbl

[11] On the identification of active constraints, SIAM J. Numer. Anal., Volume 25 (1988) no. 5, pp. 1197-1211 | DOI | MR | Zbl

[12] On the identification of active constraints II: The nonconvex case, SIAM J. Numer. Anal., Volume 27 (1990) no. 4, pp. 1081-1102 | DOI | MR | Zbl

[13] Robust signal recovery from incomplete observations, 2006 International Conference on Image Processing (2006), pp. 1281-1284 | DOI | Zbl

[14] Decoding by linear programming, IEEE Trans. Inf. Theory, Volume 51 (2005) no. 12, pp. 4203-4215 | DOI | MR | Zbl

[15] The convex geometry of linear inverse problems, Found. Comput. Math., Volume 12 (2012) no. 6, pp. 805-849 | DOI | MR | Zbl

[16] On pairwise costs for network flow multi-object tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 5537-5545

[17] Convergence of reweighted ${\ell}_{1}$ minimization algorithms and unique solution of truncated ${\ell}_{p}$ minimization, Department of Applied Mathematics, The Hong Kong Polytechnic University, 2010

[18] Generalized gradients and applications, Trans. Am. Math. Soc., Volume 205 (1975), pp. 247-262 | DOI | MR | Zbl

[19] Nonsmooth analysis and optimization, Proceedings of the international congress of mathematicians, Volume 5 (1983), pp. 847-853 | Zbl

[20] Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm, ACM Trans. Algorithms, Volume 6 (2010) no. 4, p. 63 | MR | Zbl

[21] Iteratively reweighted least squares minimization for sparse recovery, Commun. Pure Appl. Math., Volume 63 (2010) no. 1, pp. 1-38 | DOI | MR | Zbl

[22] Compressed sensing, IEEE Trans. Inf. Theory, Volume 52 (2006) no. 4, pp. 1289-1306 | DOI | MR | Zbl

[23] Lifted coordinate descent for learning with trace-norm regularization, Artificial Intelligence and Statistics (2012), pp. 327-336

[24] Conditional gradient algorithms with open loop step size rules, J. Math. Anal. Appl., Volume 62 (1978) no. 2, pp. 432-444 | DOI | MR | Zbl

[25] Safe feature elimination for the Lasso and sparse supervised learning problems, Pac. J. Optim., Volume 8 (2012) no. 4, pp. 667-698 | Zbl

[26] Improved Convergence for ${\ell}_{1}$ and ${\ell}_{\infty}$ Regression via Iteratively Reweighted Least Squares, International Conference on Machine Learning (2019), pp. 1794-1801

[27] et al. Atomic Decomposition via Polar Alignment: The Geometry of Structured Optimization, Found. Trends Optim., Volume 3 (2020) no. 4, pp. 280-366

[28] Mind the duality gap: safer rules for the lasso, International Conference on Machine Learning (2015), pp. 333-342 | Zbl

[29] An algorithm for quadratic programming, Nav. Res. Logist. Q., Volume 3 (1956) no. 1-2, pp. 95-110 | DOI | MR

[30] Dual gauge programs, with applications to quadratic programming and the minimum-norm problem, Math. Program., Volume 38 (1987) no. 1, pp. 47-67 | DOI | MR | Zbl

[31] An Extended Frank–Wolfe Method with “In-Face” Directions, and Its Application to Low-Rank Matrix Completion, SIAM J. Optim., Volume 27 (2017) no. 1, pp. 319-346 | DOI | MR | Zbl

[32] Gauge optimization and duality, SIAM J. Optim., Volume 24 (2014) no. 4, pp. 1999-2022 | DOI | MR | Zbl

[33] A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems, International Conference on Machine Learning (2013), pp. 37-45

[34] Some comments on Wolfe’s ‘away step’, Math. Program., Volume 35 (1986) no. 1, pp. 110-119 | DOI | MR | Zbl

[35] Result analysis of the NIPS 2003 feature selection challenge, Adv. Neural Inf. Process. Syst., Volume 17 (2004)

[36] Conditional gradient algorithms for norm-regularized smooth convex optimization, Math. Program., Volume 152 (2015) no. 1-2, pp. 75-112 | DOI | MR | Zbl

[37] Identifying active manifolds in regularization problems, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Springer, 2011, pp. 261-271 | DOI | Zbl

[38] Sparse approximate solutions to semidefinite programs, Latin American Symposium on Theoretical Informatics (2008), pp. 306-316 | Zbl

[39] Revisiting Frank-Wolfe: Projection-free sparse convex optimization., ICML (2013), pp. 427-435

[40] Stingy CD: safely avoiding wasteful updates in coordinate descent, Proceedings of the 34th International Conference on Machine Learning-Volume 70 (2017), pp. 1752-1760

[41] Barrier Frank-Wolfe for marginal inference, Advances in Neural Information Processing Systems (2015), pp. 532-540

[42] On the global linear convergence of Frank-Wolfe optimization variants, Advances in Neural Information Processing Systems, 2015, pp. 496-504

[43] Block-coordinate Frank-Wolfe optimization for structural SVMs, International Conference on Machine Learning (2013), pp. 53-61

[44] Sequential kernel herding: Frank-Wolfe optimization for particle filtering, Artificial Intelligence and Statistics (2015), pp. 544-552

[45] Identifying activity, SIAM J. Optim., Volume 21 (2011) no. 2, pp. 597-614 | DOI | MR | Zbl

[46] Safe screening with variational inequalities and its application to lasso, International Conference on Machine Learning (2014), pp. 289-297

[47] Relatively smooth convex optimization by first-order methods, and applications, SIAM J. Optim., Volume 28 (2018) no. 1, pp. 333-354 | MR | Zbl

[48] Sparse modeling for image and vision processing (2014) (https://arxiv.org/abs/1411.3230) | DOI

[49] Safe screening tests for Lasso based on firmly non-expansiveness, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016), pp. 4732-4736 | DOI

[50] Scalable robust matrix recovery: Frank–Wolfe meets proximal methods, SIAM J. Sci. Comput., Volume 38 (2016) no. 5, p. A3291-A3317 | MR | Zbl

[51] GAP safe screening rules for sparse multi-task and multi-class models, Advances in Neural Information Processing Systems (2015), pp. 811-819 | Zbl

[52] Introductory Lectures on Convex Optimization: A Basic Course, Springer, 2013

[53] “Active-set complexity” of proximal gradient: How long does it take to find the sparsity pattern?, Optim. Lett., Volume 13 (2019) no. 4, pp. 645-655 | DOI | MR | Zbl

[54] Coordinate descent converges faster with the Gauss-Southwell rule than random selection, International Conference on Machine Learning (2015), pp. 1632-1641

[55] Group lasso with overlaps: the latent group lasso approach (2011) (https://arxiv.org/abs/1110.0413)

[56] On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision, SIAM J. Imaging Sci., Volume 8 (2015) no. 1, pp. 331-372 | DOI | MR | Zbl

[57] Learning infinite RBMs with Frank-Wolfe, Advances in Neural Information Processing Systems (2016), pp. 3063-3071

[58] Screening rules for lasso with non-convex sparse regularizers, International Conference on Machine Learning (2019), pp. 5341-5350

[59] Forward-backward greedy algorithms for atomic norm regularization, IEEE Trans. Signal Process., Volume 63 (2015) no. 21, pp. 5798-5811 | MR | Zbl

[60] Convex Analysis, 28, Princeton University Press, 1970 | DOI

[61] Multi-task learning as multi-objective optimization, Advances in Neural Information Processing Systems (2018), pp. 527-538

[62] Greedy algorithms for structurally constrained high dimensional problems, Advances in Neural Information Processing Systems (2011), pp. 882-890

[63] Regression shrinkage and selection via the Lasso, J. R. Stat. Soc., Ser. B, Stat. Methodol., Volume 58 (1996) no. 1, pp. 267-288 | MR | Zbl

[64] Fast column generation for atomic norm regularization, Proceedings of International Conference on Artificial Intelligence and Statistics (2017)

[65] Simplicial decomposition in nonlinear programming algorithms, Math. Program., Volume 13 (1977) no. 1, pp. 49-68 | DOI | MR | Zbl

[66] LASSO screening rules via dual polytope projection, Adv. Neural Inf. Process. Syst. (2013), pp. 1070-1078 | Zbl

[67] Iteratively reweighted least squares: algorithms, convergence analysis, and numerical comparisons, SIAM J. Sci. Stat. Comput., Volume 9 (1988) no. 5, pp. 907-921 | DOI | MR | Zbl

[68] Sparse reconstruction by separable approximation, IEEE Trans. Signal Process., Volume 57 (2009) no. 7, pp. 2479-2493 | DOI | MR

[69] Generalized conditional gradient for sparse estimation, J. Mach. Learn. Theory, Volume 18 (2017) no. 1, pp. 5279-5324 | MR | Zbl

[70] Sketchy decisions: Convex low-rank matrix optimization with optimal storage, Artificial intelligence and statistics (2017), pp. 1188-1196 | Zbl

[71] The Ordered Weighted ${\ell}_{1}$ Norm: Atomic Formulation, Projections, and Algorithms (2014) (https://arxiv.org/abs/1409.4271)

[72] Limited memory Kelley’s method converges for composite convex and submodular objectives, Advances in Neural Information Processing Systems, 2018, pp. 4414-4424

*Cited by Sources: *