Piecewise Linear-Quadratic (PLQ) penalties are widely used to develop models in statistical inference, signal processing, and machine learning. Common examples of PLQ penalties include least squares, Huber, Vapnik, 1-norm, and their asymmetric generalizations. Properties of these estimators depend on the choice of penalty and its shape parameters, such as degree of asymmetry for the quantile loss, and transition point between linear and quadratic pieces for the Huber function.

In this paper, we develop a statistical framework that can help the modeler to automatically tune shape parameters once the shape of the penalty has been chosen. The choice of the parameter is informed by the basic notion that each PLQ penalty should correspond to a true statistical density. The normalization constant inherent in this requirement helps to inform the optimization over shape parameters, giving a joint optimization problem over these as well as primary parameters of interest. A second contribution is to consider optimization methods for these joint problems. We show that basic first-order methods can be immediately brought to bear, and design specialized extensions of interior point (IP) methods for PLQ problems that can quickly and efficiently solve the joint problem. Synthetic problems and larger-scale practical examples illustrate the utility of the approach. Code for the new IP method is implemented using the Julia language (https://github.com/UW-AMO/shape-parameters/tree/ojmo).

Accepted:

Revised after acceptance:

Published online:

^{1}; Karthikeyan Natesan Ramamurthy

^{2}; Aleksandr Y. Aravkin

^{3}

@article{OJMO_2021__2__A8_0, author = {Peng Zheng and Karthikeyan Natesan Ramamurthy and Aleksandr Y. Aravkin}, title = {Estimating {Shape} {Parameters} of {Piecewise} {Linear-Quadratic} {Problems}}, journal = {Open Journal of Mathematical Optimization}, eid = {8}, publisher = {Universit\'e de Montpellier}, volume = {2}, year = {2021}, doi = {10.5802/ojmo.10}, language = {en}, url = {https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.10/} }

TY - JOUR TI - Estimating Shape Parameters of Piecewise Linear-Quadratic Problems JO - Open Journal of Mathematical Optimization PY - 2021 DA - 2021/// VL - 2 PB - Université de Montpellier UR - https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.10/ UR - https://doi.org/10.5802/ojmo.10 DO - 10.5802/ojmo.10 LA - en ID - OJMO_2021__2__A8_0 ER -

Peng Zheng; Karthikeyan Natesan Ramamurthy; Aleksandr Y. Aravkin. Estimating Shape Parameters of Piecewise Linear-Quadratic Problems. Open Journal of Mathematical Optimization, Volume 2 (2021), article no. 8, 18 p. doi : 10.5802/ojmo.10. https://ojmo.centre-mersenne.org/articles/10.5802/ojmo.10/

[1] Sparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory, J. Mach. Learn. Res., Volume 14 (2013), pp. 2689-2728 | MR | Zbl

[2] Generalized system identification with stable spline kernels, SIAM J. Sci. Comput., Volume 40 (2018) no. 5, p. B1419-B1443 | DOI | MR | Zbl

[3] Efficient quadratic penalization through the partial minimization technique, IEEE Trans. Autom. Control, Volume 63 (2017) no. 7, pp. 2131-2138 | DOI | MR | Zbl

[4] Estimating nuisance parameters in inverse problems, Inverse Probl., Volume 28 (2012) no. 11, 115016, 13 pages | MR | Zbl

[5] et al. A survey of cross-validation procedures for model selection, Stat. Surv., Volume 4 (2010), pp. 40-79 | MR | Zbl

[6] Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka–Łojasiewicz inequality, Math. Oper. Res., Volume 35 (2010) no. 2, pp. 438-457 | DOI | Zbl

[7] A relative weighting method for estimating parameters and variances in multiple data sets, Comput. Stat. Data Anal., Volume 22 (1996) no. 2, pp. 119-135 | DOI | MR | Zbl

[8] Asymmetric Laplace Regression: Maximum Likelihood, Maximum Entropy and Quantile Regression, J. Econom. Methods, Volume 5 (2016) no. 1, pp. 79-101 | MR | Zbl

[9] Algorithms for hyper-parameter optimization, Advances in Neural Information Processing Systems (2011), pp. 2546-2554

[10] Random search for hyper-parameter optimization, J. Mach. Learn. Res., Volume 13 (2012), pp. 281-305 | MR | Zbl

[11] Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Math. Program., Volume 146 (2014) no. 1-2, pp. 459-494 | DOI | MR | Zbl

[12] Convex optimization, Cambridge University Press, 2004 | Zbl

[13] Robust principal component analysis?, J. ACM, Volume 58 (2011) no. 3, 11, 37 pages | MR | Zbl

[14] Sparse and low-rank matrix decompositions, IFAC Proceedings Volumes, Volume 42 (2009) no. 10, pp. 1493-1498 | DOI

[15] Compressed sensing, IEEE Trans. Inf. Theory, Volume 52 (2006) no. 4, pp. 1289-1306 | DOI | MR | Zbl

[16] Adapting regularized low-rank models for parallel architectures, SIAM J. Sci. Comput., Volume 41 (2019) no. 1, p. A163-A189 | DOI | MR | Zbl

[17] Robust statistics, 523, John Wiley & Sons, 2004

[18] Sequential model-based optimization for general algorithm configuration, International Conference on Learning and Intelligent Optimization (2011), pp. 507-523 | DOI

[19] Fast Laplace optimization of machine learning hyperparameters on large datasets (2016) (https://arxiv.org/abs/1605.07079)

[20] Quantile regression, J. Econom. Perspect., Volume 15 (2001) no. 4, pp. 143-156 | DOI

[21] A Unified Approach to Interior Point Algorithms for Linear Complementarity Problems, Lecture Notes in Computer Science, 538, Springer, 1991 | MR

[22] Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization (2016) (https://arxiv.org/abs/1603.06560)

[23] Statistical modeling of complex backgrounds for foreground object detection, IEEE Trans. Image Process., Volume 13 (2004) no. 11, pp. 1459-1472

[24] Interior-Point Polynomial Algorithms in Convex Programming, Studies in Applied Mathematics, 13, Society for Industrial and Applied Mathematics, 1994 | MR

[25] Iterative Solution of Symmetric Quasi-Definite Linear Systems, SIAM Spotlights, 3, Society for Industrial and Applied Mathematics, 2017 | MR | Zbl

[26] RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images, IEEE Trans. Pattern Anal. Mach. Intell., Volume 34 (2012) no. 11, pp. 2233-2246 | DOI

[27] Variational analysis, 317, Springer, 2009

[28] Practical Bayesian optimization of machine learning algorithms, Advances in neural information processing systems (2012), pp. 2951-2959

[29] Regression shrinkage and selection via the lasso, J. R. Stat. Soc., Ser. B, Stat. Methodol., Volume 58 (1996) no. 1, pp. 267-288 | MR | Zbl

[30] Bayesian variable selection and estimation in maximum entropy quantile regression, J. Appl. Stat., Volume 44 (2017) no. 2, pp. 253-269 | DOI | MR | Zbl

[31] Face recognition using eigenfaces, Computer Vision and Pattern Recognition (1991), pp. 586-591 | DOI

[32] Primal-Dual Interior-Point Methods, Society for Industrial and Applied Mathematics, 1997 | Zbl

[33] Bayesian quantile regression, Stat. Probab. Lett., Volume 54 (2001) no. 4, pp. 437-447 | MR | Zbl

[34] Tilt: Transform invariant low-rank textures, Int. J. Comput. Vision, Volume 99 (2012) no. 1, pp. 1-24 | DOI | MR | Zbl

*Cited by Sources: *