Smoothing, regularisation, penalization and friends

Printer-friendly version

In nonparametric statistics we might estimate simultaneously what look like
many, many parameters, which we constrain in some clever fashion,
which usually boils down to something we can interpret as a “smoothing”
parameters, controlling how many parameters we still have to model
from a subset of the original.

The “regularisation” nomenclature claims descent from Tikhonov, (eg TiGl65 etc) who wanted to solve ill-conditioned integral and differential equations, so it’s slightly more general.
“Smoothing” seems to be common in the
spline and
kernel estimate communities of
Wahba (Wahb90) and Silverman (Silv84) et al,
who usually actually want to smooth curves.

Penalization” has a geneology unknown to me, but is probably the least abstruse for common usage.

These are, AFAICT, more or less the same thing.
“smoothing” is more common in my communities which is fine,
but we have to remember that “smoothing” an estimator might not always infer smooth dynamics in the estimand;
it could be something else being smoothed, such as variance in the estimate of parameters of a rough function.

In every case, you wish to solve an ill-conditioned inverse problem, so you tame it by adding a penalty to solutions you feel one should be reluctant to accept.

TODO: make comprehensible

TODO: examples

TODO: discuss connection with model selection

TODO: discuss connection with compressed sensing.

The real classic approach here is spline smoothing of functional data.
More recent approaches are things like sparse regression.

Refs

Bach00
Bach, F. (n.d.) Model-Consistent Sparse Estimation through the Bootstrap.
ChHS15
Chernozhukov, V., Hansen, C., & Spindler, M. (2015) Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach. Annual Review of Economics, 7(1), 649–688. DOI.
EHJT04
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004) Least angle regression. The Annals of Statistics, 32(2), 407–499. DOI.
FlHS13
Flynn, C. J., Hurvich, C. M., & Simonoff, J. S.(2013) Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models. arXiv:1302.2068 [Stat].
FrHT10
Friedman, J., Hastie, T., & Tibshirani, R. (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. DOI.
JaFH13
Janson, L., Fithian, W., & Hastie, T. (2013) Effective Degrees of Freedom: A Flawed Metaphor. arXiv:1312.7851 [Stat].
KaRo14
Kaufman, S., & Rosset, S. (2014) When does more regularization imply fewer degrees of freedom? Sufficient conditions and counterexamples. Biometrika, 101(4), 771–784. DOI.
KoMi06
Koenker, R., & Mizera, I. (2006) Density estimation by total variation regularization. Advances in Statistical Modeling and Inference, 613–634.
LiRW10
Liu, H., Roeder, K., & Wasserman, L. (2010) Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), Advances in Neural Information Processing Systems 23 (pp. 1432–1440). Curran Associates, Inc.
MeBü10
Meinshausen, N., & Bühlmann, P. (2010) Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473. DOI.
Meye08
Meyer, M. C.(2008) Inference using shape-restricted regression splines. The Annals of Applied Statistics, 2(3), 1013–1033. DOI.
Silv84
Silverman, B. W.(1984) Spline Smoothing: The Equivalent Variable Kernel Method. The Annals of Statistics, 12(3), 898–916. DOI.
SmSM98
Smola, A. J., Schölkopf, B., & Müller, K.-R. (1998) The connection between regularization operators and support vector kernels. Neural Networks, 11(4), 637–649. DOI.
TKPS14
Tansey, W., Koyejo, O., Poldrack, R. A., & Scott, J. G.(2014) False discovery rate smoothing. arXiv:1411.6144 [Stat].
TiGl65
Tikhonov, A. N., & Glasko, V. B.(1965) Use of the regularization method in non-linear problems. USSR Computational Mathematics and Mathematical Physics, 5(3), 93–107. DOI.
Geer14
van de Geer, S. (2014) Weakly decomposable regularization penalties and structured sparsity. Scandinavian Journal of Statistics, 41(1), 72–86. DOI.
Wahb90
Wahba, G. (1990) Spline Models for Observational Data. . SIAM
WeMZ16
Weng, H., Maleki, A., & Zheng, L. (2016) Overcoming The Limitations of Phase Transition by Higher Order Analysis of Regularization Techniques. arXiv:1603.07377 [Cs, Math, Stat].
Wood00
Wood, S. N.(2000) Modelling and smoothing parameter estimation with multiple quadratic penalties. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(2), 413–428. DOI.
Wood08
Wood, S. N.(2008) Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3), 495–518. DOI.
ZoHa05
Zou, H., & Hastie, T. (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. DOI.
ZoHT07
Zou, H., Hastie, T., & Tibshirani, R. (2007) On the “degrees of freedom” of the lasso. The Annals of Statistics, 35(5), 2173–2192. DOI.

See original: The Living Thing / Notebooks Smoothing, regularisation, penalization and friends