Smoothing, regularisation, penalization and friends

Printer-friendly version

In nonparametric statistics we might estimate simultaneously what look like
many, many parameters, which we constrain in some clever fashion,
which usually boils down to something we can interpret as a “smoothing”
parameters, controlling how many parameters we still have to model
from a subset of the original.

The “regularisation” nomenclature claims descent from Tikhonov, (eg TiGl65 etc) who wanted to solve ill-conditioned integral and differential equations, so it’s slightly more general.
“Smoothing” seems to be common in the
spline and
kernel estimate communities of
Wahba (Wahb90) and Silverman (Silv84) et al,
who usually actually want to smooth curves.

Penalization” has a geneology unknown to me, but is probably the least abstruse for common usage.

These are, AFAICT, more or less the same thing.
“smoothing” is more common in my communities which is fine,
but we have to remember that “smoothing” an estimator might not always infer smooth dynamics in the estimand;
it could be something else being smoothed, such as variance in the estimate of parameters of a rough function.

In every case, you wish to solve an ill-conditioned inverse problem, so you tame it by adding a penalty to solutions you feel one should be reluctant to accept.

TODO: make comprehensible

TODO: examples

TODO: discuss connection with model selection

TODO: discuss connection with compressed sensing.

The real classic approach here is spline smoothing of functional data.
More recent approaches are things like sparse regression.


Bach, F. (n.d.) Model-Consistent Sparse Estimation through the Bootstrap.
Chernozhukov, V., Hansen, C., & Spindler, M. (2015) Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach. Annual Review of Economics, 7(1), 649–688. DOI.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004) Least angle regression. The Annals of Statistics, 32(2), 407–499. DOI.
Flynn, C. J., Hurvich, C. M., & Simonoff, J. S.(2013) Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models. arXiv:1302.2068 [Stat].
Friedman, J., Hastie, T., & Tibshirani, R. (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. DOI.
Janson, L., Fithian, W., & Hastie, T. (2013) Effective Degrees of Freedom: A Flawed Metaphor. arXiv:1312.7851 [Stat].
Kaufman, S., & Rosset, S. (2014) When does more regularization imply fewer degrees of freedom? Sufficient conditions and counterexamples. Biometrika, 101(4), 771–784. DOI.
Koenker, R., & Mizera, I. (2006) Density estimation by total variation regularization. Advances in Statistical Modeling and Inference, 613–634.
Liu, H., Roeder, K., & Wasserman, L. (2010) Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), Advances in Neural Information Processing Systems 23 (pp. 1432–1440). Curran Associates, Inc.
Meinshausen, N., & Bühlmann, P. (2010) Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473. DOI.
Meyer, M. C.(2008) Inference using shape-restricted regression splines. The Annals of Applied Statistics, 2(3), 1013–1033. DOI.
Silverman, B. W.(1984) Spline Smoothing: The Equivalent Variable Kernel Method. The Annals of Statistics, 12(3), 898–916. DOI.
Smola, A. J., Schölkopf, B., & Müller, K.-R. (1998) The connection between regularization operators and support vector kernels. Neural Networks, 11(4), 637–649. DOI.
Tansey, W., Koyejo, O., Poldrack, R. A., & Scott, J. G.(2014) False discovery rate smoothing. arXiv:1411.6144 [Stat].
Tikhonov, A. N., & Glasko, V. B.(1965) Use of the regularization method in non-linear problems. USSR Computational Mathematics and Mathematical Physics, 5(3), 93–107. DOI.
van de Geer, S. (2014) Weakly decomposable regularization penalties and structured sparsity. Scandinavian Journal of Statistics, 41(1), 72–86. DOI.
Wahba, G. (1990) Spline Models for Observational Data. . SIAM
Weng, H., Maleki, A., & Zheng, L. (2016) Overcoming The Limitations of Phase Transition by Higher Order Analysis of Regularization Techniques. arXiv:1603.07377 [Cs, Math, Stat].
Wood, S. N.(2000) Modelling and smoothing parameter estimation with multiple quadratic penalties. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(2), 413–428. DOI.
Wood, S. N.(2008) Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3), 495–518. DOI.
Zou, H., & Hastie, T. (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. DOI.
Zou, H., Hastie, T., & Tibshirani, R. (2007) On the “degrees of freedom” of the lasso. The Annals of Statistics, 35(5), 2173–2192. DOI.

See original: The Living Thing / Notebooks Smoothing, regularisation, penalization and friends