Statistical estimation of Information and other fiddly functionals

Printer-friendly version

Say I would like to know the mutual information of the process generating two streams of observations, with weak assumptions on the form of the generation process.

(Why would I want to do this by itself? I don’t know. I’m sure a use case will come along.)

Because observations with low frequency have high influence on the estimate, this can be tricky. It is easy to get a uslessly biassed —- or even inconsistent —- estimator, especially in the nonparametric case.

A typical technique, is to construct a joint histogram from your
samples, treat the bins as as a finite alphabet and then do the usual
That throws out a lot if information, and it feels clunky and stupid, especially if you suspect your distributions might have some other kind of smoothness that you’d like to exploit.
Moreover this method is highly sensitive and can be arbitrarily wrong if you don’t do it right (see Paninski, 2003).

So, better alternatives?

To consider:

  • Based on autorship alone, KKPW14 is the best place to start.
  • Kraskov’s (2004) NN-method looks nice, but don’t yet have any guarantees that I know of
  • the relationship between mutual information and 2-dimensional
    spatial statistics.
  • relationship between mutual information and copula entropy.
  • those occasional mentions of calculating mutual information from recurrence plots-
    how do they work?

To read

Barnett, L., & Bossomaier, T. (2012) Transfer Entropy as a Log-likelihood Ratio. arXiv:1205.6339.
Beirlant, J., Dudewicz, E. J., Györfi, L., & van der Meulen, E. C.(1997) Nonparametric entropy estimation: An overview. Journal of Mathematical and Statistical Sciences, 6(1), 17–39.
Chao, A., & Shen, T.-J. (2003) Nonparametric estimation of Shannon?s index of diversity when there are unseen species in sample. Environmental and Ecological Statistics, 10(4), 429–443. DOI.
Darbellay, G. A., & Vajda, I. (1999) Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45, 1315–1321. DOI.
Darbellay, G. A., & Wuertz, D. (2000) The entropy as a tool for analysing statistical dependences in financial time series. Physica A: Statistical Mechanics and Its Applications, 287(3?4), 429–439. DOI.
Daub, C. O., Steuer, R., Selbig, J., & Kloska, S. (2004) Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data. BMC Bioinformatics, 5(1), 118. DOI.
Doucet, A., Jacob, P. E., & Rubenthaler, S. (2013) Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models. arXiv:1304.5768 [Stat].
Gao, S., Ver Steeg, G., & Galstyan, A. (n.d.) Estimating Mutual Information by Local Gaussian Approximation.
Hausser, J., & Strimmer, K. (2009) Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks. Journal of Machine Learning Research, 10, 1469.
Jiao, J., Venkat, K., Han, Y., & Weissman, T. (2014) Maximum Likelihood Estimation of Functionals of Discrete Distributions. arXiv:1406.6959 [Cs, Math, Stat].
Jiao, J., Venkat, K., Han, Y., & Weissman, T. (2015) Minimax Estimation of Functionals of Discrete Distributions. IEEE Transactions on Information Theory, 61(5), 2835–2885. DOI.
Kandasamy, K., Krishnamurthy, A., Poczos, B., Wasserman, L., & Robins, J. M.(2014) Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations. arXiv:1411.4342 [Stat].
Kennel, M. B., Shlens, J., Abarbanel, H. D. I., & Chichilnisky, E. J.(2005) Estimating Entropy Rates with Bayesian Confidence Intervals. Neural Computation, 17(7). DOI.
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004) Estimating mutual information. Physical Review E, 69, 66138. DOI.
Liese, F., & Vajda, I. (2006) On Divergences and Informations in Statistics and Information Theory. IEEE Transactions on Information Theory, 52(10), 4394–4412. DOI.
Lizier, J. T., Prokopenko, M., & Zomaya, A. Y.(2008) A framework for the local information dynamics of distributed computation in complex systems.
Marton, K., & Shields, P. C.(1994) Entropy and the consistent estimation of joint distributions. The Annals of Probability, 22(2), 960–977.
Moon, Y. I., Rajagopalan, B., & Lall, U. (1995) Estimation of mutual information using kernel density estimators. Physical Review E, 52, 2318–2321. DOI.
Nemenman, I., Bialek, W., & de Ruyter Van Steveninck, R. (2004) Entropy and information in neural spike trains: Progress on the sampling problem. Physical Review E, 69(5), 56111.
Nemenman, I., Shafee, F., & Bialek, W. (2002) Entropy and inference, revisited. In Advances in Neural Information Processing Systems 14 (Vol. 14). Cambridge, MA, USA: The MIT Press
Paninski, L. (2003) Estimation of entropy and mutual information. Neural Computation, 15(6), 1191–1253. DOI.
Panzeri, S., Senatore, R., Montemurro, M. A., & Petersen, R. S.(2007) Correcting for the sampling bias problem in spike train information measures. Journal of Neurophysiology, 98, 1064–1072. DOI.
Panzeri, S., & Treves, A. (1996) Analytical estimates of limited sampling biases in different information measures. Network: Computation in Neural Systems, 7(1), 87–107.
Robinson, P. M.(1991) Consistent Nonparametric Entropy-Based Testing. The Review of Economic Studies, 58(3), 437. DOI.
Roulston, M. S.(1999) Estimating the errors on measured entropy and mutual information. Physica D: Nonlinear Phenomena, 125(3–4), 285–294. DOI.
Schürmann, T. (2015) A Note on Entropy Estimation. Neural Computation, 27(10), 2097–2106. DOI.
Staniek, M., & Lehnertz, K. (2008) Symbolic transfer entropy. Physical Review Letters, 100(15), 158101. DOI.
Vejmelka, M., & Paluš, M. (2008) Inferring the directionality of coupling with conditional mutual information. Phys. Rev. E, 77(2), 26214. DOI.
Victor, J. D.(2002) Binless strategies for estimation of information from neural data. Physical Review E, 66, 51903. DOI.
Wolf, D. R., & Wolpert, D. H.(1994a) Estimating Functions of Distributions from A Finite Set of Samples, Part 2: Bayes Estimators for Mutual Information, Chi-Squared, Covariance and other Statistics. arXiv:comp-gas/9403002.
Wolpert, D. H., & Wolf, D. R.(1994b) Estimating Functions of Probability Distributions from a Finite Set of Samples, Part 1: Bayes Estimators and the Shannon Entropy. arXiv:comp-gas/9403001.
Wu, Y., & Yang, P. (2014) Minimax rates of entropy estimation on large alphabets via best polynomial approximation. arXiv:1407.0381 [Cs, Math, Stat].

See original: The Living Thing / Notebooks Statistical estimation of Information and other fiddly functionals