Estimating Shape Parameters of Piecewise Linear-Quadratic Problems
Open Journal of Mathematical Optimization, Volume 2 (2021), article no. 8, 18 p.

Piecewise Linear-Quadratic (PLQ) penalties are widely used to develop models in statistical inference, signal processing, and machine learning. Common examples of PLQ penalties include least squares, Huber, Vapnik, 1-norm, and their asymmetric generalizations. Properties of these estimators depend on the choice of penalty and its shape parameters, such as degree of asymmetry for the quantile loss, and transition point between linear and quadratic pieces for the Huber function.

In this paper, we develop a statistical framework that can help the modeler to automatically tune shape parameters once the shape of the penalty has been chosen. The choice of the parameter is informed by the basic notion that each PLQ penalty should correspond to a true statistical density. The normalization constant inherent in this requirement helps to inform the optimization over shape parameters, giving a joint optimization problem over these as well as primary parameters of interest. A second contribution is to consider optimization methods for these joint problems. We show that basic first-order methods can be immediately brought to bear, and design specialized extensions of interior point (IP) methods for PLQ problems that can quickly and efficiently solve the joint problem. Synthetic problems and larger-scale practical examples illustrate the utility of the approach. Code for the new IP method is implemented using the Julia language (https://github.com/UW-AMO/shape-parameters/tree/ojmo).

Received:
Accepted:
Revised after acceptance:
Published online:
DOI: https://doi.org/10.5802/ojmo.10
@article{OJMO_2021__2__A8_0,
     author = {Peng Zheng and Karthikeyan Natesan Ramamurthy and Aleksandr Y. Aravkin},
     title = {Estimating {Shape} {Parameters} of {Piecewise} {Linear-Quadratic} {Problems}},
     journal = {Open Journal of Mathematical Optimization},
     eid = {8},
     publisher = {Universit\'e de Montpellier},
     volume = {2},
     year = {2021},
     doi = {10.5802/ojmo.10},
     language = {en},
     url = {https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.10/}
}
Peng Zheng; Karthikeyan Natesan Ramamurthy; Aleksandr Y. Aravkin. Estimating Shape Parameters of Piecewise Linear-Quadratic Problems. Open Journal of Mathematical Optimization, Volume 2 (2021), article  no. 8, 18 p. doi : 10.5802/ojmo.10. https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.10/

[1] Aleksandr Y. Aravkin; James V. Burke; Gianluigi Pillonetto Sparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory, J. Mach. Learn. Res., Volume 14 (2013), pp. 2689-2728 | MR 3121655 | Zbl 1318.62237

[2] Aleksandr Y. Aravkin; James V. Burke; Gianluigi Pillonetto Generalized system identification with stable spline kernels, SIAM J. Sci. Comput., Volume 40 (2018) no. 5, p. B1419-B1443 | Article | MR 3867622 | Zbl 1404.62051

[3] Aleksandr Y. Aravkin; Dmitriy Drusvyatskiy; Tristan van Leeuwen Efficient quadratic penalization through the partial minimization technique, IEEE Trans. Autom. Control, Volume 63 (2017) no. 7, pp. 2131-2138 | Article | MR 3820213 | Zbl 1423.49020

[4] Aleksandr Y. Aravkin; Tristan Van Leeuwen Estimating nuisance parameters in inverse problems, Inverse Probl., Volume 28 (2012) no. 11, 115016, 13 pages | MR 2997233 | Zbl 1253.49021

[5] Sylvain Arlot; Alain Celisse et al. A survey of cross-validation procedures for model selection, Stat. Surv., Volume 4 (2010), pp. 40-79 | MR 2602303 | Zbl 1190.62080

[6] Hédy Attouch; Jérôme Bolte; Patrick Redont; Antoine Soubeyran Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka–Łojasiewicz inequality, Math. Oper. Res., Volume 35 (2010) no. 2, pp. 438-457 | Article | Zbl 1214.65036

[7] Bradley M. Bell; James V. Burke; Alan Schumitzky A relative weighting method for estimating parameters and variances in multiple data sets, Comput. Stat. Data Anal., Volume 22 (1996) no. 2, pp. 119-135 | Article | MR 1410383 | Zbl 0900.65402

[8] Anil K. Bera; Antonio F. Galvao; Gabriel V. Montes-Rojas; Sung Y. Park Asymmetric Laplace Regression: Maximum Likelihood, Maximum Entropy and Quantile Regression, J. Econom. Methods, Volume 5 (2016) no. 1, pp. 79-101 | MR 3441120 | Zbl 1345.62169

[9] James S. Bergstra; Rémi Bardenet; Yoshua Bengio; Balázs Kégl Algorithms for hyper-parameter optimization, Advances in Neural Information Processing Systems (2011), pp. 2546-2554

[10] James S. Bergstra; Yoshua Bengio Random search for hyper-parameter optimization, J. Mach. Learn. Res., Volume 13 (2012), pp. 281-305 | MR 2913701 | Zbl 1283.68282

[11] Jérôme Bolte; Shoham Sabach; Marc Teboulle Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Math. Program., Volume 146 (2014) no. 1-2, pp. 459-494 | Article | MR 3232623 | Zbl 1297.90125

[12] Stephen Boyd; Lieven Vandenberghe Convex optimization, Cambridge University Press, 2004 | Zbl 1058.90049

[13] Emmanuel J Candès; Xiaodong Li; Yi Ma; John Wright Robust principal component analysis?, J. ACM, Volume 58 (2011) no. 3, 11, 37 pages | MR 2811000 | Zbl 1327.62369

[14] Venkat Chandrasekaran; Sujay Sanghavi; Pablo A. Parrilo; Alan S. Willsky Sparse and low-rank matrix decompositions, IFAC Proceedings Volumes, Volume 42 (2009) no. 10, pp. 1493-1498 | Article

[15] David L Donoho Compressed sensing, IEEE Trans. Inf. Theory, Volume 52 (2006) no. 4, pp. 1289-1306 | Article | MR 2241189 | Zbl 1288.94016

[16] Derek Driggs; Stephen Becker; Aleksandr Y. Aravkin Adapting regularized low-rank models for parallel architectures, SIAM J. Sci. Comput., Volume 41 (2019) no. 1, p. A163-A189 | Article | MR 3895341 | Zbl 1405.65079

[17] Peter J. Huber Robust statistics, 523, John Wiley & Sons, 2004

[18] Frank Hutter; Holger H. Hoos; Kevin Leyton-Brown Sequential model-based optimization for general algorithm configuration, International Conference on Learning and Intelligent Optimization (2011), pp. 507-523 | Article

[19] Aaron Klein; Stefan Falkner; Simon Bartels; Philipp Hennig; Frank Hutter Fast Laplace optimization of machine learning hyperparameters on large datasets (2016) (https://arxiv.org/abs/1605.07079)

[20] Roger Koenker; Kevin F. Hallock Quantile regression, J. Econom. Perspect., Volume 15 (2001) no. 4, pp. 143-156 | Article

[21] Masakazu Kojima; Nimrod Megiddo; Toshihito Noma; Akiko Yoshise A Unified Approach to Interior Point Algorithms for Linear Complementarity Problems, Lecture Notes in Computer Science, 538, Springer, 1991 | MR 1226025

[22] Lisha Li; Kevin Jamieson; Giulia DeSalvo; Afshin Rostamizadeh; Ameet Talwalkar Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization (2016) (https://arxiv.org/abs/1603.06560)

[23] Liyuan Li; Weimin Huang; Irene Yu-Hua Gu; Qi Tian Statistical modeling of complex backgrounds for foreground object detection, IEEE Trans. Image Process., Volume 13 (2004) no. 11, pp. 1459-1472

[24] Arkadi Nemirovski; Yurii Nesterov Interior-Point Polynomial Algorithms in Convex Programming, Studies in Applied Mathematics, 13, Society for Industrial and Applied Mathematics, 1994 | MR 1258086

[25] Dominique Orban; Mario Arioli Iterative Solution of Symmetric Quasi-Definite Linear Systems, SIAM Spotlights, 3, Society for Industrial and Applied Mathematics, 2017 | MR 3660695 | Zbl 1409.65004

[26] Yigang Peng; Arvind Ganesh; John Wright; Wenli Xu; Yi Ma RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images, IEEE Trans. Pattern Anal. Mach. Intell., Volume 34 (2012) no. 11, pp. 2233-2246 | Article

[27] R. Tyrrell Rockafellar; Roger J.-B. Wets Variational analysis, 317, Springer, 2009

[28] Jasper Snoek; Hugo Larochelle; Ryan P. Adams Practical Bayesian optimization of machine learning algorithms, Advances in neural information processing systems (2012), pp. 2951-2959

[29] Robert Tibshirani Regression shrinkage and selection via the lasso, J. R. Stat. Soc., Ser. B, Stat. Methodol., Volume 58 (1996) no. 1, pp. 267-288 | MR 1379242 | Zbl 0850.62538

[30] Shiyi Tu; Min Wang; Xiaoqian Sun Bayesian variable selection and estimation in maximum entropy quantile regression, J. Appl. Stat., Volume 44 (2017) no. 2, pp. 253-269 | Article | MR 3607735 | Zbl 07282037

[31] Matthew A. Turk; Alex P. Pentland Face recognition using eigenfaces, Computer Vision and Pattern Recognition (1991), pp. 586-591 | Article

[32] Stephen J. Wright Primal-Dual Interior-Point Methods, Society for Industrial and Applied Mathematics, 1997 | Zbl 0863.65031

[33] Keming Yu; Rana A. Moyeed Bayesian quantile regression, Stat. Probab. Lett., Volume 54 (2001) no. 4, pp. 437-447 | MR 1861390 | Zbl 0983.62017

[34] Zhengdong Zhang; Arvind Ganesh; Xiao Liang; Yi Ma Tilt: Transform invariant low-rank textures, Int. J. Comput. Vision, Volume 99 (2012) no. 1, pp. 1-24 | Article | MR 2917017 | Zbl 1254.68290

Cited by document(s). Sources: