Statistical estimation of Information and other fiddly functionals

Say I would like to know the mutual information of the process generating two streams of observations, with weak assumptions on the form of the generation process.

(Why would I want to do this by itself? I don’t know. I’m sure a use case will come along.)

Because observations with low frequency have high influence on the estimate, this can be tricky. It is easy to get a uslessly biassed —- or even inconsistent —- estimator, especially in the nonparametric case.

A typical technique, is to construct a joint histogram from your
samples, treat the bins as as a finite alphabet and then do the usual
That throws out a lot if information, and it feels clunky and stupid, especially if you suspect your distributions might have some other kind of smoothness that you’d like to exploit.
Moreover this method is highly sensitive and can be arbitrarily wrong if you don’t do it right (see Paninski, 2003).

So, better alternatives?

To consider:

  • Based on autorship alone, KKPW14 is the best place to start.
  • Kraskov’s (2004) NN-method looks nice, but don’t yet have any guarantees that I know of
  • the relationship between mutual information and 2-dimensional
    spatial statistics.
  • relationship between mutual information and copula entropy.
  • those occasional mentions of calculating mutual information from recurrence plots-
    how do they work?

