Learning Gamelan

On online learning of
sparse basis dictionaries,
for music.
Blind IIR deconvolution with an unusual loss function.
or “shift invariant sparse coding”.

It seems like this would boil down to something like sparse dictionary
learning, with the sparse activations, and a dictionary
sparse in LPC components.

There are two ways to do this - time domain, and frequency domain.

For the latter, sparse time-domain activations are non local in Fourier components, but possibly simple to recover.

For the former, one could solve Durbin-Watson equations in the time domain, although we expect that to be unstable.
We could go for sparse simultaneous kernel inference in the time domain, which might be better, or directly infer the Horner-form.
Then we have a lot of simultaneous filter components and tedious inference for them.
Otherwise, we could do it directly in the FFT domain, although this makes MIMO harder, and excludes the potential for non-linearities.
The fact that I am expecting to identify many distinct systems in Fourier space as atoms complicates this slightly.

Thought: can I use HPSS to do this with the purely harmonic components?
And use the percussive components as priors for the activations?
How do you enforce causality for triggering in the FFT-transformed domain?

We have activations and components, but the activations are a KxT matrix, and
the K components the rows of a KxL matrix.
We wish the convolution of one with the other to approximately recover the
original signal with a certain loss function.

Why gamelan?
It’s tuned percussion, with a non-trivial tuning system, and no pitch bending.


Other questions:
Infer chained biquads? Even restrict them to be bandpass?
Or sparse, high-order filters of some description?

See original: The Living Thing / Notebooks Learning Gamelan

Statistical estimation of Information and other fiddly functionals

Say I would like to know the mutual information of the process generating two streams of observations, with weak assumptions on the form of the generation process.

(Why would I want to do this by itself? I don’t know. I’m sure a use case will come along.)

Because observations with low frequency have high influence on the estimate, this can be tricky. It is easy to get a uslessly biassed —- or even inconsistent —- estimator, especially in the nonparametric case.

A typical technique, is to construct a joint histogram from your
samples, treat the bins as as a finite alphabet and then do the usual
That throws out a lot if information, and it feels clunky and stupid, especially if you suspect your distributions might have some other kind of smoothness that you’d like to exploit.
Moreover this method is highly sensitive and can be arbitrarily wrong if you don’t do it right (see Paninski, 2003).

So, better alternatives?

To consider:

  • Based on autorship alone, KKPW14 is the best place to start.
  • Kraskov’s (2004) NN-method looks nice, but don’t yet have any guarantees that I know of
  • the relationship between mutual information and 2-dimensional
    spatial statistics.
  • relationship between mutual information and copula entropy.
  • those occasional mentions of calculating mutual information from recurrence plots-
    how do they work?

To read

Barnett, L., & Bossomaier, T. (2012) Transfer Entropy as a Log-likelihood Ratio. arXiv:1205.6339.
Beirlant, J., Dudewicz, E. J., Györfi, L., & van der Meulen, E. C.(1997) Nonparametric entropy estimation: An overview. Journal of Mathematical and Statistical Sciences, 6(1), 17–39.
Chao, A., & Shen, T.-J. (2003) Nonparametric estimation of Shannon?s index of diversity when there are unseen species in sample. Environmental and Ecological Statistics, 10(4), 429–443. DOI.
Darbellay, G. A., & Vajda, I. (1999) Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45, 1315–1321. DOI.
Darbellay, G. A., & Wuertz, D. (2000) The entropy as a tool for analysing statistical dependences in financial time series. Physica A: Statistical Mechanics and Its Applications, 287(3?4), 429–439. DOI.
Daub, C. O., Steuer, R., Selbig, J., & Kloska, S. (2004) Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data. BMC Bioinformatics, 5(1), 118. DOI.
Doucet, A., Jacob, P. E., & Rubenthaler, S. (2013) Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models. arXiv:1304.5768 [Stat].
Gao, S., Ver Steeg, G., & Galstyan, A. (n.d.) Estimating Mutual Information by Local Gaussian Approximation.
Hausser, J., & Strimmer, K. (2009) Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks. Journal of Machine Learning Research, 10, 1469.
Jiao, J., Venkat, K., Han, Y., & Weissman, T. (2014) Maximum Likelihood Estimation of Functionals of Discrete Distributions. arXiv:1406.6959 [Cs, Math, Stat].
Jiao, J., Venkat, K., Han, Y., & Weissman, T. (2015) Minimax Estimation of Functionals of Discrete Distributions. IEEE Transactions on Information Theory, 61(5), 2835–2885. DOI.
Kandasamy, K., Krishnamurthy, A., Poczos, B., Wasserman, L., & Robins, J. M.(2014) Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations. arXiv:1411.4342 [Stat].
Kennel, M. B., Shlens, J., Abarbanel, H. D. I., & Chichilnisky, E. J.(2005) Estimating Entropy Rates with Bayesian Confidence Intervals. Neural Computation, 17(7). DOI.
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004) Estimating mutual information. Physical Review E, 69, 66138. DOI.
Liese, F., & Vajda, I. (2006) On Divergences and Informations in Statistics and Information Theory. IEEE Transactions on Information Theory, 52(10), 4394–4412. DOI.
Lizier, J. T., Prokopenko, M., & Zomaya, A. Y.(2008) A framework for the local information dynamics of distributed computation in complex systems.
Marton, K., & Shields, P. C.(1994) Entropy and the consistent estimation of joint distributions. The Annals of Probability, 22(2), 960–977.
Moon, Y. I., Rajagopalan, B., & Lall, U. (1995) Estimation of mutual information using kernel density estimators. Physical Review E, 52, 2318–2321. DOI.
Nemenman, I., Bialek, W., & de Ruyter Van Steveninck, R. (2004) Entropy and information in neural spike trains: Progress on the sampling problem. Physical Review E, 69(5), 56111.
Nemenman, I., Shafee, F., & Bialek, W. (2002) Entropy and inference, revisited. In Advances in Neural Information Processing Systems 14 (Vol. 14). Cambridge, MA, USA: The MIT Press
Paninski, L. (2003) Estimation of entropy and mutual information. Neural Computation, 15(6), 1191–1253. DOI.
Panzeri, S., Senatore, R., Montemurro, M. A., & Petersen, R. S.(2007) Correcting for the sampling bias problem in spike train information measures. Journal of Neurophysiology, 98, 1064–1072. DOI.
Panzeri, S., & Treves, A. (1996) Analytical estimates of limited sampling biases in different information measures. Network: Computation in Neural Systems, 7(1), 87–107.
Robinson, P. M.(1991) Consistent Nonparametric Entropy-Based Testing. The Review of Economic Studies, 58(3), 437. DOI.
Roulston, M. S.(1999) Estimating the errors on measured entropy and mutual information. Physica D: Nonlinear Phenomena, 125(3–4), 285–294. DOI.
Schürmann, T. (2015) A Note on Entropy Estimation. Neural Computation, 27(10), 2097–2106. DOI.
Staniek, M., & Lehnertz, K. (2008) Symbolic transfer entropy. Physical Review Letters, 100(15), 158101. DOI.
Vejmelka, M., & Paluš, M. (2008) Inferring the directionality of coupling with conditional mutual information. Phys. Rev. E, 77(2), 26214. DOI.
Victor, J. D.(2002) Binless strategies for estimation of information from neural data. Physical Review E, 66, 51903. DOI.
Wolf, D. R., & Wolpert, D. H.(1994a) Estimating Functions of Distributions from A Finite Set of Samples, Part 2: Bayes Estimators for Mutual Information, Chi-Squared, Covariance and other Statistics. arXiv:comp-gas/9403002.
Wolpert, D. H., & Wolf, D. R.(1994b) Estimating Functions of Probability Distributions from a Finite Set of Samples, Part 1: Bayes Estimators and the Shannon Entropy. arXiv:comp-gas/9403001.
Wu, Y., & Yang, P. (2014) Minimax rates of entropy estimation on large alphabets via best polynomial approximation. arXiv:1407.0381 [Cs, Math, Stat].

See original: The Living Thing / Notebooks Statistical estimation of Information and other fiddly functionals

Content aggregators

Upon the efficient consumption and summarizing of news from around the world.

I have been told to do this through twitter or facebook, but, seriously… no.
Those are systems designed to waste time with stupid distractions to benefit someone else.

Contrarily, I would like to find ways to summarise and condense information to save time for myself.

Telling me to use someone’s social website to gain information is like telling me to play poker machines to fix my financial troubles.. Stop it.

Feed readers

The classic.

You know what podcasts are?
Podcasts are a type of feed. An audio feed.
If I care about news articles and tumblr posts and whatever, not just audio, then I use feeds, feeds of text instead of audio. Any website can have a feed. Many do.



Remember when we thought the web would be a useful tool for researching and learning, and that automated research assistants would trawl the web for us?
RSS Feeds were often discussed as piece of that machine.

Little updates dripped from the web, to be sliced, diced, prioritised and analysed by our software to keep us aware of… whatever.

Most feed readers don’t do any of that fancy analysis though,
they just give you a list of new items ordered by date.
Still, whatever. Better than nothing.

  • commercial offerings

  • Indie-style

    I will run a server if the application is good enough, but it has to be worth the time investment. Let’s say between backups, security issues, confusing DNS failures etc, that’s 8 hours per year of miscellaneous computer wrangling, best case, and more hours if you have complicated things like some multi-user database like MySQL. Very few things are good enough to be worth the opportunity cost of that time.
    Why people insist on running enterprise databases to hold a reading list is an ongoing mystery to me. The capacity to scale to many users is nice, I suppose, but by that logic everyone should drive everywhere in a school bus.

    • miniflux is open-source, but also offers a hosted version for $15/year.
    • stringer looks like a nice little ruby app but need postgresql. Bloat!
    • tinytinyrss is the original “minimalist” RSS reader; it still need more databases than is sensible.
    • fever is a weird commercial ($30) application that you host on your own server. It claims to learn your information preferences, negating my previous complaint. But I cannot be arsed installing some database-wanting app with suspiciously machine-learning-inappropriate language requirements (PHP3) that also costs money to try, so I will never know.

See original: The Living Thing / Notebooks Content aggregators

Practical workshop in magnetite nanoparticles preparation

16/07/2016 10:30
Contact Email: 

Naqaa Nanotechnology Network is organizing a Practical Workshop in the Magnetite nanoparticles preparation for one day from
10:30 am till 3:30pm on Saturday 16 July 2016 which will contains lectures about different applications of magnetite nanoparticles and practical preparation of
Magnetite Nanoparticles

Important: Don't forget to get your lab coat with you for the practical part

Fees are 200 EGP

Spaces will be limited to 12 participants, so we ask attendees to register ahead of time

Fees include: Lectures on CD+ Practical part + lunch break+ Certificate.

Certificates will be accredited by NNN

For more information please call 01098915757, 01115831621
Those who would like to register:
Just send us an email at naqaafoundation@gmail.com containing:

1- Your full triple name as you want in Certificate

2- Your position

3-Your mobile

4-your email

Subject of email:Practical Workshop i
email message: I want to attend

Best regards

5 Ahmed Amged street

Practical workshop in magnetite nanoparticles preparation

Naqaa Nanotechnology Network is organizing a Practical Workshop in the Magnetite nanoparticles preparation for one day from
10:30 am till 3:30pm on Saturday 16 July 2016 which will contains lectures about different applications of magnetite nanoparticles and practical preparation of
Magnetite Nanoparticles

Important: Don't forget to get your lab coat with you for the practical part

Fees are 200 EGP

Spaces will be limited to 12 participants, so we ask attendees to register ahead of time

Fees include: Lectures on CD+ Practical part + lunch break+ Certificate.

Certificates will be accredited by NNN

For more information please call 01098915757, 01115831621
Those who would like to register:
Just send us an email at naqaafoundation@gmail.com containing:

1- Your full triple name as you want in Certificate

2- Your position

3-Your mobile

4-your email

Subject of email:Practical Workshop i
email message: I want to attend

Best regards

Composition, music theory, mostly Western.

Sometime you don’t want to generate a chord, or measure a chord, or
learn a chord,
you just want to write a chord.

Helpful software for the musically vexed

  • Fabrizio Poce’s
    J74 progressive and J74 bassline
    are some chord progression
    generators from his library of very clever chord generators linked in to
    Ableton Live’s scripting engine,
    so if you
    are using Ableton they might be very handy.
    They are cheap (EUR12 + EUR15).
    I use them myself, but they DO make Ableton crash a wee bit, so not really
    suited for live performance, which is a pity because that would be a
    wonderful unique selling point.
    The realtime-oriented J74 HarmoTools from the same guy
    are less sophisticated but worth trying, especially since they are free, and
    he has lot of other clever hacks there too.
    Basically, just go to this guy’s
    site and try his stuff out. You don’t have to stop there.
  • Odesi
    (USD49) has been doing lots of advertising and has a very nice pop-interface.
    It’s like Synfire-lite with a library of pop tricks and rhythms.
    The desktop version tries to install gigabytes of synths of meagre merit on your machine,
    which is a giant waste of space an time if you are using a computer with synths on,
    which you are because this is 2016.
  • Helio is free and cross platform and totally worth a shot.
    There is a chord model in there and version control (!) but you might not notice the chord thing if you aren’t careful
  • Mixtikl / Noatikl are grandaddy apps for this, although the creators doubtless put much effort into the sleek user interfaces, their complete inability to explain their app or provide compelling demonstrations or use cases leave me cold.
    I get the feeling they had high-art aspirations but have ended up basically doing ambient noodles in order to sell product; Maybe I’m not being fair. (USD25/USD40)
  • Rapid Compose (USD99/USD249) might make decent software, but can’t really explain why their app is nice or provide a demo version.
  • synfire explains how it uses music theory to do large-scale scoring etc. Get the string section to behave itself or you’ll replace them with MIDIbots. (EUR996, so I won’t be buying it, but great demo video.)
  • harmony builder does classical music theory for you.
    USD39-USD219 depending on heinously complex pricing schemes.
    Will pass your conservatorium finals.
  • You can’t resist rolling your own?
    sharp11 is a node.js music theory library for javascript with demo application to create jazz improv.
  • Supercollider of course does this and everything else, but designing user interfaces for it will take years off your life. OTOH, if you are happy with text, this might be a goer.


Constraint Composition

All of that too mainstream? Try a weird alternative formalism!
How about constrain composition? That’s
declarative musical composition by defining constraints which the notes must satisfy.
Sounds fun in the abstract but the details don’t grab me somehow.

The reference here is strasheela built on an obscure, unpopular, and apparently discontinued Prolog-like language called “Oz” or “Mozart”, because using existing languages is not a grand a gesture as claiming none of them are quire Turing complete enough for your special thingy.

That is a bit of a ghost town;
If you wanted to actually do this, you’d probably use overtone + minikanren (prolog-for-lisp) to do this, as with
the composing schemer,
or to be even more mainstream, just use a normal constraint solver in a normal language.
I am fond of python and ncvx.

Anyway, prolog fans read on.

  • Anders, T., & Miranda, E. R.(2008). Higher-Order Constraint Applicators for Music Constraint Programming. In Proceedings of the 2008 International Computer Music Conference. Belfast, UK.
  • Anders, T., & Miranda, E. R.(2010). Constraint Application with Higher-Order Programming for Modeling Music Theories. Computer Music Journal, 34(2), 25–38. DOI. Online.
  • Anders, T., & Miranda, E. R.(2011). Constraint programming systems for modeling music theories and composition. ACM Computing Surveys, 43(4), 1–38. DOI. Online.
  • Anders, T., & Miranda, E. R.(2009). A computational model that generalises Schoenberg’s guidelines for favourable chord progressions. In Proceedings of the Sound and Music Computing Conference. Citeseer. Online.

See original: The Living Thing / Notebooks Composition, music theory, mostly Western.

Gaussian distribution and Erf and Normality

Stunts with Gaussian distributions.

Let’s start here with the basic thing.
The (univariate) standard Gaussian pdf

\psi:x\mapsto \frac{1}{sqrt{2\pi}}\text{exp}\left(-\frac{x^2}{2}\right)

We define
.. math:

\Psi:x\mapsto \int_{-\infty}^x\psi{t} dt

This erf function is popular, isn’t it?
Unavoidable if you do computer algebra.
But I can never remember what it is.
There’s this scaling factor tacked on.


\operatorname{erf}(x)\; =\; \frac{1}{\sqrt{\pi}} \int_{-x}^x e^{-t^2} \, dt
\sqrt{\frac{\pi }{2}} \left(\text{erf}\left(\frac{x}{\sqrt{2}}\right)+1\right)

Differential representation

Non-linear univariate DE represention.

\sigma ^2 f'(x)+f(x) (x-\mu )&=0\\
f(0) &=\frac{e^{-\mu ^2/(2\sigma ^2)}}{\sqrt{2 \sigma^2\pi } }\\
L(x) &=(\sigma^2 D+x-\mu)

Linear PDE representation as a diffusion equation (see, e.g. BoGK10)

\frac{\partial}{\partial t)f(x;t) &=\frac{1}{2}\frac{\partial^2}{\partial x^2}f(x;t)\\

Look, it’s the diffusion equation of Wiener process.


\| \frac{d}{dx}\phi_\sigma \|_2 &= \frac{1}{4\sqrt{\pi}\simga^3}\\
\| \left(\frac{d}{dx}\right)^n \phi_\sigma \|_2 &= \frac{\prod_{i<n}2n-1}{2^{n+1}\sqrt{\pi}\simga^{2n+1}}


Botev, Z. I.(2016) The Normal Law Under Linear Restrictions: Simulation and Estimation via Minimax Tilting. Journal of the Royal Statistical Society: Series B (Statistical Methodology), n/a-n/a. DOI.
Botev, Z. I., Grotowski, J. F., & Kroese, D. P.(2010) Kernel density estimation via diffusion. The Annals of Statistics, 38(5), 2916–2957. DOI.

See original: The Living Thing / Notebooks Gaussian distribution and Erf and Normality

Sparse regression and things that look a bit like it.

Related to compressed sensing but here we consider sampling complexity and the effect of measurement noise.

See also matrix factorisations,
model selection,
multiple testing,
concentration inequalities,
sparse flavoured icecream.

To discuss:

LARS, LASSO, de-biassed LASSO, Elastic net, etc.


I’m not going to mention LASSO in (generalised) linear regression,
since everything does that these days (Oh alright,
Jerome Friedman’s glmnet for R is the fastest,
and has a MATLAB version.

But SPAMS (C++, MATLAB, R, python) by Mairal himself, looks interesting.
It’s an optimisation library for many various in sparse problems.

See original: The Living Thing / Notebooks Sparse regression and things that look a bit like it.

Eating Japanese Knotweed (and other daft ideas)

Image: Wikipedia

There have been a number of calls(1,2,3,4) in recent weeks and months to control the invasive plant Japanese Knotweed, at least partially, by eating it. In recent days, Kerry County Council in Ireland heard from one member who, albeit with tongue-in-cheek, urged citizens to make wine, jelly and other sweet treats from the plant.

This strikes me as a terrible idea.

The plant itself is certainly edible - the Japanese have been eating it for years. It's Japanese name, itadori, means 'well being' and it seems to have some medicinal properties. It also tastes a bit like rhubarb apparently. I wouldn't know, I haven't tried it.

I haven't tried it for the same reason I don't advise you try it. Encouraging people to harvest and transport a regulated, invasive species is the perfect recipe (if you'll pardon the pun) for its continued and accelerated spread.

Japanese Knotweed (Fallopia japonica) is, as you will have guessed, native to Japan and the neighbouring region. It was introduced to the UK in the mid-19th century and quickly spread to Ireland and other parts of the world. Introduced as an ornamental plant, it quickly became a real problem.

The plant is capable of growing at a tremendous rate - 1 metre in a month- and forms big stands 2-3 metres in height. The early shoots are spear like, similar to asparagus in appearance and the plants produce delicate white flowers in late Summer. The real problem is underground where the plant forms tough rhizomes, adapted root-like organs, which remain in the soil even during the Winter when the rest of the plant dies back.

Japanese Knotweed thrives on disturbance and it is mainly spread by fragments of rhizome, crown or stem being accidentally or deliberately moved. This leads to some real (and expensive) problems including a massive reduction in biodiversity under the alien canopy; structural damage to buildings and infrastructure; and the significant cost of its removal.

Data from 2010 suggest that the plant costs the UK £165 million a year to control. If the plant were to be eradicated in the UK by current methods it would cost £1.56 billion. For one site alone, the 2012 London Olympic site, it cost £88 million to deal with this one invasive plant. Nobody wants Japanese Knotweed on their land.

Image: Wikipedia

Imagine you go to the supermarket and buy a bunch of rhubarb. The first thing you do is chop the top and bottom off the stalks and chuck them on your compost heap. Do this with Japanese Knotweed and you end up costing yourself (and potentially your neighbours) thousands in a cleanup bill.

Harvesting Japanese Knotweed from the wild, no matter how careful you are, is also fraught with problems. The plant can easily regrow from small fragments the size of your fingernail. If we're lucky, you'll drop these fragments at the original, infested site. If not, you'll drop them on your walk back to the car or in your front garden when you unload the car.

Simply put, encouraging people to mess around with an invasive species like Japanese Knotweed is, in my view, irresponsible. It may also be illegal.

In Ireland, it is an offence to "plant, disperse or cause to disperse or otherwise cause to grow" the plant. It is also an offence if "he/she has in his/her possession for sale or for breeding/reproduction/transport....anything from which the plant can be reproduced or propagated".

In the meantime, there are chemical and physical control options and scientists in the UK are developing a biological control approach using a sap-sucking insect called Aphalara itadori. This is an old enemy of the plant, found in Japan and currently being tested in the UK to see if it will do the same job in this part of the world (and not eat anything else, by accident). The trials haven't been a total success with numbers surviving over winter too low to have much of an effect, but the tests are ongoing. Hopefully, before too long we will have a sustainable control option for this invasive plant. In the meantime, stop eating it.

See original: Communicate Science Eating Japanese Knotweed (and other daft ideas)

Smoothing, regularisation, penalization and friends

In nonparametric statistics we might estimate simultaneously what look like
many, many parameters, which we constrain in some clever fashion,
which usually boils down to something we can interpret as a “smoothing”
parameters, controlling how many parameters we still have to model
from a subset of the original.

The “regularisation” nomenclature claims descent from Tikhonov, (eg TiGl65 etc) who wanted to solve ill-conditioned integral and differential equations, so it’s slightly more general.
“Smoothing” seems to be common in the
spline and
kernel estimate communities of
Wahba (Wahb90) and Silverman (Silv84) et al,
who usually actually want to smooth curves.

Penalization” has a geneology unknown to me, but is probably the least abstruse for common usage.

These are, AFAICT, more or less the same thing.
“smoothing” is more common in my communities which is fine,
but we have to remember that “smoothing” an estimator might not always infer smooth dynamics in the estimand;
it could be something else being smoothed, such as variance in the estimate of parameters of a rough function.

In every case, you wish to solve an ill-conditioned inverse problem, so you tame it by adding a penalty to solutions you feel one should be reluctant to accept.

TODO: make comprehensible

TODO: examples

TODO: discuss connection with model selection

TODO: discuss connection with compressed sensing.

The real classic approach here is spline smoothing of functional data.
More recent approaches are things like sparse regression.


Bach, F. (n.d.) Model-Consistent Sparse Estimation through the Bootstrap.
Chernozhukov, V., Hansen, C., & Spindler, M. (2015) Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach. Annual Review of Economics, 7(1), 649–688. DOI.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004) Least angle regression. The Annals of Statistics, 32(2), 407–499. DOI.
Flynn, C. J., Hurvich, C. M., & Simonoff, J. S.(2013) Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models. arXiv:1302.2068 [Stat].
Friedman, J., Hastie, T., & Tibshirani, R. (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. DOI.
Janson, L., Fithian, W., & Hastie, T. (2013) Effective Degrees of Freedom: A Flawed Metaphor. arXiv:1312.7851 [Stat].
Kaufman, S., & Rosset, S. (2014) When does more regularization imply fewer degrees of freedom? Sufficient conditions and counterexamples. Biometrika, 101(4), 771–784. DOI.
Koenker, R., & Mizera, I. (2006) Density estimation by total variation regularization. Advances in Statistical Modeling and Inference, 613–634.
Liu, H., Roeder, K., & Wasserman, L. (2010) Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), Advances in Neural Information Processing Systems 23 (pp. 1432–1440). Curran Associates, Inc.
Meinshausen, N., & Bühlmann, P. (2010) Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473. DOI.
Meyer, M. C.(2008) Inference using shape-restricted regression splines. The Annals of Applied Statistics, 2(3), 1013–1033. DOI.
Silverman, B. W.(1984) Spline Smoothing: The Equivalent Variable Kernel Method. The Annals of Statistics, 12(3), 898–916. DOI.
Smola, A. J., Schölkopf, B., & Müller, K.-R. (1998) The connection between regularization operators and support vector kernels. Neural Networks, 11(4), 637–649. DOI.
Tansey, W., Koyejo, O., Poldrack, R. A., & Scott, J. G.(2014) False discovery rate smoothing. arXiv:1411.6144 [Stat].
Tikhonov, A. N., & Glasko, V. B.(1965) Use of the regularization method in non-linear problems. USSR Computational Mathematics and Mathematical Physics, 5(3), 93–107. DOI.
van de Geer, S. (2014) Weakly decomposable regularization penalties and structured sparsity. Scandinavian Journal of Statistics, 41(1), 72–86. DOI.
Wahba, G. (1990) Spline Models for Observational Data. . SIAM
Weng, H., Maleki, A., & Zheng, L. (2016) Overcoming The Limitations of Phase Transition by Higher Order Analysis of Regularization Techniques. arXiv:1603.07377 [Cs, Math, Stat].
Wood, S. N.(2000) Modelling and smoothing parameter estimation with multiple quadratic penalties. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(2), 413–428. DOI.
Wood, S. N.(2008) Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3), 495–518. DOI.
Zou, H., & Hastie, T. (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. DOI.
Zou, H., Hastie, T., & Tibshirani, R. (2007) On the “degrees of freedom” of the lasso. The Annals of Statistics, 35(5), 2173–2192. DOI.

See original: The Living Thing / Notebooks Smoothing, regularisation, penalization and friends


DJ Rupture:

Yet our sounds are also a vocabulary for those who detest the walled-off concentrations of wealth, and steal property back: the collectives that build their own sound systems, stage free parties, and invite DJs to perform. The international DJ becomes emblematic of global capitalism’s complicated cultural dimension. On flights and at the free Continental breakfasts in hotels, often the same soul-destroying hotel chains in each city, we get stuck chatting with our fellow Americans and Western Europeans, the executives eager to find compatriots. We make small talk with these consultants and deal-makers in the descending elevators in the evening—then go out to the city’s dead-end and unowned spaces or its luxury venues to soundtrack the night of the region’s youth, hungry for something new. DJ music is now the common art form of squatters and the nouveau riche; it is the soundtrack both for capital and for its opposition.

tangerine echonest

see also machine listening,
audio software

DJing software

So many choices, now. I use Ableton, but Traktor and Serrato are more designed for this.

Open source/ lower cost alternatives?

  • flow8deck is made by the people who made mixedinkey, software for the musically vexed. It handles keychanges good.
  • Traktor
  • Serrato
  • Djay

See original: The Living Thing / Notebooks DJing

Moving the poors to marginal electorate

OK, Let’s start treating politics like the favour machine it is and behave accordingly;
NSW under Mike baird is a system wherew you buy favours with leverage.
I’d like it to be otherwise, buyt let’s look

Optimal marginalness.
Invade marginal electorates
Oerganised opposition menas we are more likely to claim council seats as a side benefit.

See original: The Living Thing / Notebooks Moving the poors to marginal electorate

Recurrent neural networks

Feedback neural networks structured to have memory and a notion of “current” and “past” states, which can encode time (or whatever).

As someone who does a lot of signal processing for music, the notion that these generalise linear systems theory is suggestive of lots of interesting DSP applications.

The connection between these (IIR) and “convolutional” (FIR) neural networks is suggestive for the same reason.



The main problem here is that they are unstable in the training phase unless you are clever.
See BeSF94. One solution is LSTM; see next.

Long Short Term Memory (LSTM)

As always, Christopher Olah wins the visual explanation prize:
Understanding LSTM Networks
LSTM Networks for Sentiment Analysis:

In a traditional recurrent neural network, during the gradient back-propagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of timesteps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.[…]

These issues are the main motivation behind the LSTM model which introduces a new structure called a memory cell…]. A memory cell is composed of four main elements: an input gate, a neuron with a self-recurrent connection (a connection to itself), a forget gate and an output gate. […]The gates serve to modulate the interactions between the memory cell itself and its environment.


A mini-genre.
KaDG15 et al connect recurrent cells across multiple axes, leading to a higher-rank MIMO system;
This is natural in many kinds of spatial random fields, and I am amazed it was uncommon enough to need formalizing in a paper; but it was and it did and good on Kalchbrenner et al.

Gate Recurrent Unit (GRU)


Liquid/ Echo State Machines

This sounds deliciously lazy;
Very roughly speaking, your first layer is a reservoir of random saturating IIR filters.
You fit a classifier on the outputs of this.
Easy to implement, that.
I wonder when it actually works, constraints on topology etc.

I wonder if you can use some kind of sparsifying transform on the recurrence operator?

These claim to be based on spiky models, but AFAICT this is not at all necessary.

Various claims are made about how hard they avoid the training difficulty of similarly basic RNNs by being essentially untrained; you use them as a feature factory for another supervised output algorithm.

Suggestive parallel with random projections.


From a dynamical systems perspective, there are two main classes of RNNs.
Models from the first class are characterized by an energy-minimizing
stochastic dynamics and symmetric connections.
The best known instantiations are Hopfield networks, Boltzmann machines, and
the recently emerging Deep Belief Networks.
These networks are mostly trained in some unsupervised learning scheme.
Typical targeted network functionalities in this field are associative
memories, data compression, the unsupervised modeling of data distributions,
and static pattern classification, where the model is run for multiple time
steps per single input instance to reach some type of convergence or
(but see e.g., TaHR06 for extension to temporal data).
The mathematical background is rooted in statistical physics.
In contrast, the second big class of RNN models typically features a
deterministic update dynamics and directed connections.
Systems from this class implement nonlinear filters, which
transform an input time series into an output time series.
The mathematical background here is nonlinear dynamical systems.
The standard training mode is supervised.
This survey is concerned only with RNNs of this second type, and
when we speak of RNNs later on, we will exclusively refer to such systems.


It’s still the wild west. Invent a category, name it and stake a claim.


Variable sequence length:

Danijar Hafner:
* Introduction to Recurrent Networks in TensorFlow
* https://danijar.com/variable-sequence-lengths-in-tensorflow/

seq2seq models with GRUs : Fun with Recurrent Neural Nets: One More Dive into CNTK and TensorFlow


Auer, P., Burgsteiner, H., & Maass, W. (2008) A learning rule for very simple universal approximators consisting of a single layer of perceptrons. Neural Networks, 21(5), 786–795. DOI.
Bengio, Y., Simard, P., & Frasconi, P. (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. DOI.
Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012) Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In 29th International Conference on Machine Learning.
Bown, O., & Lexer, S. (2006) Continuous-Time Recurrent Neural Networks for Generative and Interactive Musical Performance. In F. Rothlauf, J. Branke, S. Cagnoni, E. Costa, C. Cotta, R. Drechsler, … H. Takagi (Eds.), Applications of Evolutionary Computing (pp. 652–663). Springer Berlin Heidelberg
Buhusi, C. V., & Meck, W. H.(2005) What makes us tick? Functional and neural mechanisms of interval timing. Nature Reviews Neuroscience, 6(10), 755–765. DOI.
Cho, K., van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014) On the properties of neural machine translation: Encoder-decoder approaches. arXiv Preprint arXiv:1409.1259.
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014) Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In NIPS.
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2015) Gated Feedback Recurrent Neural Networks. arXiv:1502.02367 [Cs, Stat].
Doelling, K. B., & Poeppel, D. (2015) Cortical entrainment to music and its modulation by expertise. Proceedings of the National Academy of Sciences, 112(45), E6233–E6242. DOI.
Duan, Q., Park, J. H., & Wu, Z.-G. (2014) Exponential state estimator design for discrete-time neural networks with discrete and distributed time-varying delays. Complexity, 20(1), 38–48. DOI.
Gal, Y. (2015) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. arXiv:1512.05287 [Stat].
Gers, F. A., Schmidhuber, J., & Cummins, F. (2000) Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10), 2451–2471. DOI.
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) DRAW: A Recurrent Neural Network For Image Generation. arXiv:1502.04623 [Cs].
Grzyb, B. J., Chinellato, E., Wojcik, G. M., & Kaminski, W. A.(2009) Which model to use for the Liquid State Machine?. In 2009 International Joint Conference on Neural Networks (pp. 1018–1024). DOI.
Hazan, H., & Manevitz, L. M.(2012) Topological constraints and robustness in liquid state machines. Expert Systems with Applications, 39(2), 1597–1606. DOI.
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., … Kingsbury, B. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. DOI.
Hochreiter, S., & Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. DOI.
Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015) An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp. 2342–2350).
Kalchbrenner, N., Danihelka, I., & Graves, A. (2015) Grid Long Short-Term Memory. arXiv:1507.01526 [Cs].
Karpathy, A., Johnson, J., & Fei-Fei, L. (2015) Visualizing and Understanding Recurrent Networks. arXiv:1506.02078 [Cs].
LeCun, Y. (1998) Gradient-based learning applied to document recognition. Proc. IEEE, 86(11), 2278–2324. DOI.
Legenstein, R., Naeger, C., & Maass, W. (2005) What Can a Neuron Learn with Spike-Timing-Dependent Plasticity?. Neural Computation, 17(11), 2337–2382. DOI.
Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015) A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv:1506.00019 [Cs].
Lukoševičius, M., & Jaeger, H. (2009) Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3), 127–149. DOI.
Maass, W., Natschläger, T., & Markram, H. (2004) Computational Models for Generic Cortical Microcircuits. In Computational Neuroscience: A Comprehensive Approach (pp. 575–605). Chapman & Hall/CRC
Miconi, T. (2015) Training recurrent neural networks with sparse, delayed rewards for flexible decision tasks. arXiv:1507.08973 [Q-Bio].
Mnih, V. (2015) Human-level control through deep reinforcement learning. Nature, 518, 529–533. DOI.
Mohamed, A. r, Dahl, G. E., & Hinton, G. (2012) Acoustic Modeling Using Deep Belief Networks. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 14–22. DOI.
Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E. H., & Freeman, W. T.(2015) Visually Indicated Sounds. arXiv:1512.08512 [Cs].
Rohrbach, A., Rohrbach, M., & Schiele, B. (2015) The Long-Short Story of Movie Description. arXiv:1506.01698 [Cs].
Schwenk, H. (2007) Continuous space language models. Computer Speech Lang., 21, 492–518. DOI.
Taylor, G. W., Hinton, G. E., & Roweis, S. T.(2006) Modeling human motion using binary latent variables. In Advances in neural information processing systems (pp. 1345–1352).
Theis, L., & Bethge, M. (2015) Generative Image Modeling Using Spatial LSTMs. arXiv:1506.03478 [Cs, Stat].
Visin, F., Kastner, K., Cho, K., Matteucci, M., Courville, A., & Bengio, Y. (2015) ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv:1505.00393 [Cs].
Waibel, A. (1989) Phoneme recognition using time-delay neural networks. IEEE Trans. Acoustics Speech Signal Process., 37(3), 328–339. DOI.
Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., & Courville, A. (2015) Describing Videos by Exploiting Temporal Structure. arXiv:1502.08029 [Cs, Stat].

See original: The Living Thing / Notebooks Recurrent neural networks

Generalised linear models

Using the machinery of linear regression to predict in
somewhat more general regressions.

This means you are still doing Maximum Likelihood regression,
but outside the setting of homoskedastic gaussian noise and linear response.

Not quite as fancy as generalised additive models,
but if you have to implement such models yourself,
less work. If you are using R this is not you.

To learn:

  1. When we can do this? e.g. Must the response be from an exponential family for really real? Wikipedia mentions the “overdispersed exponential family” which is no such thing.
  2. Does anything funky happen with regularisation?
  3. Whether to merge this in with quasilikelihood.
  4. Fitting variance parameters.

Pieces of the method follow.

Response distribution

TBD. What constraints do we have here

Linear Predictor

Link function

An invertible (monotonic?) function
relating the mean of the linear predictor and
the mean of the response distribution.


Buja, A., Hastie, T., & Tibshirani, R. (1989) Linear Smoothers and Additive Models. The Annals of Statistics, 17(2), 453–510.
Currie, I. D., Durban, M., & Eilers, P. H. C.(2006) Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(2), 259–280. DOI.
Friedman, J., Hastie, T., & Tibshirani, R. (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. DOI.
Hansen, N. R.(2010) Penalized maximum likelihood estimation for generalized linear point processes. arXiv:1003.0848 [Math, Stat].
Hosseinian, Sahar. (2009) Robust inference for generalized linear models: binary and poisson regression. . ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
Lee, Y., Nelder, J. A., & Pawitan, Y. (2006) Generalized linear models with random effects. . Boca Raton, FL: Chapman & Hall/CRC
McCullagh, P. (1984) Generalized linear models. European Journal of Operational Research, 16(3), 285–292. DOI.
Nelder, J. A., & Baker, R. J.(2004) Generalized Linear Models. In Encyclopedia of Statistical Sciences. John Wiley & Sons, Inc.
Nelder, J. A., & Wedderburn, R. W. M.(1972) Generalized Linear Models. Journal of the Royal Statistical Society. Series A (General), 135(3), 370–384. DOI.
Proietti, T., & Luati, A. (2013) Generalised Linear Spectral Models (CEIS Research Paper No. 290). . Tor Vergata University, CEIS
Wedderburn, R. W. M.(1974) Quasi-likelihood functions, generalized linear models, and the Gauss—Newton method. Biometrika, 61(3), 439–447. DOI.
Wedderburn, R. W. M.(1976) On the existence and uniqueness of the maximum likelihood estimates for certain generalized linear models. Biometrika, 63(1), 27–32. DOI.
Wood, S. N.(2008) Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3), 495–518. DOI.
Xia, T., Wang, X.-R., & Jiang, X.-J. (2014) Asymptotic properties of maximum quasi-likelihood estimator in quasi-likelihood nonlinear models with misspecified variance function. Statistics, 48(4), 778–786. DOI.

See original: The Living Thing / Notebooks Generalised linear models