Learning Gamelan
Sun, 03/07/2016  4:35am  by dan mackinlayOn online learning of
sparse basis dictionaries,
for music.
Blind IIR deconvolution with an unusual loss function.
or “shift invariant sparse coding”.
It seems like this would boil down to something like sparse dictionary
learning, with the sparse activations, and a dictionary
sparse in LPC components.
There are two ways to do this  time domain, and frequency domain.
For the latter, sparse timedomain activations are non local in Fourier components, but possibly simple to recover.
For the former, one could solve DurbinWatson equations in the time domain, although we expect that to be unstable.
We could go for sparse simultaneous kernel inference in the time domain, which might be better, or directly infer the Hornerform.
Then we have a lot of simultaneous filter components and tedious inference for them.
Otherwise, we could do it directly in the FFT domain, although this makes MIMO harder, and excludes the potential for nonlinearities.
The fact that I am expecting to identify many distinct systems in Fourier space as atoms complicates this slightly.
Thought: can I use HPSS to do this with the purely harmonic components?
And use the percussive components as priors for the activations?
How do you enforce causality for triggering in the FFTtransformed domain?
We have activations and components, but the activations are a KxT matrix, and
the K components the rows of a KxL matrix.
We wish the convolution of one with the other to approximately recover the
original signal with a certain loss function.
Why gamelan?
It’s tuned percussion, with a nontrivial tuning system, and no pitch bending.
Theory:
TBD
Other questions:
Infer chained biquads? Even restrict them to be bandpass?
Or sparse, highorder filters of some description?
See original: Learning Gamelan
Statistical estimation of Information and other fiddly functionals
Wed, 29/06/2016  3:31am  by dan mackinlaySay I would like to know the mutual information of the process generating two streams of observations, with weak assumptions on the form of the generation process.
(Why would I want to do this by itself? I don’t know. I’m sure a use case will come along.)
Because observations with low frequency have high influence on the estimate, this can be tricky. It is easy to get a uslessly biassed — or even inconsistent — estimator, especially in the nonparametric case.
A typical technique, is to construct a joint histogram from your
samples, treat the bins as as a finite alphabet and then do the usual
calculation.
That throws out a lot if information, and it feels clunky and stupid, especially if you suspect your distributions might have some other kind of smoothness that you’d like to exploit.
Moreover this method is highly sensitive and can be arbitrarily wrong if you don’t do it right (see Paninski, 2003).
So, better alternatives?
To consider:
 Based on autorship alone, KKPW14 is the best place to start.
 Kraskov’s (2004) NNmethod looks nice, but don’t yet have any guarantees that I know of
 the relationship between mutual information and 2dimensional
spatial statistics.  relationship between mutual information and copula entropy.
 those occasional mentions of calculating mutual information from recurrence plots
how do they work?
To read
 BaBo12
 Barnett, L., & Bossomaier, T. (2012) Transfer Entropy as a Loglikelihood Ratio. arXiv:1205.6339.
 BDGM97
 Beirlant, J., Dudewicz, E. J., Györfi, L., & van der Meulen, E. C.(1997) Nonparametric entropy estimation: An overview. Journal of Mathematical and Statistical Sciences, 6(1), 17–39.
 ChSh03
 Chao, A., & Shen, T.J. (2003) Nonparametric estimation of Shannon?s index of diversity when there are unseen species in sample. Environmental and Ecological Statistics, 10(4), 429–443. DOI.
 DaVa99
 Darbellay, G. A., & Vajda, I. (1999) Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45, 1315–1321. DOI.
 DaWu00
 Darbellay, G. A., & Wuertz, D. (2000) The entropy as a tool for analysing statistical dependences in financial time series. Physica A: Statistical Mechanics and Its Applications, 287(3?4), 429–439. DOI.
 DSSK04
 Daub, C. O., Steuer, R., Selbig, J., & Kloska, S. (2004) Estimating mutual information using Bspline functions  an improved similarity measure for analysing gene expression data. BMC Bioinformatics, 5(1), 118. DOI.
 DoJR13
 Doucet, A., Jacob, P. E., & Rubenthaler, S. (2013) DerivativeFree Estimation of the Score Vector and Observed Information Matrix with Application to StateSpace Models. arXiv:1304.5768 [Stat].
 GaVG00
 Gao, S., Ver Steeg, G., & Galstyan, A. (n.d.) Estimating Mutual Information by Local Gaussian Approximation.
 HaSt09
 Hausser, J., & Strimmer, K. (2009) Entropy Inference and the JamesStein Estimator, with Application to Nonlinear Gene Association Networks. Journal of Machine Learning Research, 10, 1469.
 JVHW14
 Jiao, J., Venkat, K., Han, Y., & Weissman, T. (2014) Maximum Likelihood Estimation of Functionals of Discrete Distributions. arXiv:1406.6959 [Cs, Math, Stat].
 JVHW15
 Jiao, J., Venkat, K., Han, Y., & Weissman, T. (2015) Minimax Estimation of Functionals of Discrete Distributions. IEEE Transactions on Information Theory, 61(5), 2835–2885. DOI.
 KKPW14
 Kandasamy, K., Krishnamurthy, A., Poczos, B., Wasserman, L., & Robins, J. M.(2014) Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations. arXiv:1411.4342 [Stat].
 KSAC05
 Kennel, M. B., Shlens, J., Abarbanel, H. D. I., & Chichilnisky, E. J.(2005) Estimating Entropy Rates with Bayesian Confidence Intervals. Neural Computation, 17(7). DOI.
 KrSG04
 Kraskov, A., Stögbauer, H., & Grassberger, P. (2004) Estimating mutual information. Physical Review E, 69, 66138. DOI.
 LiVa06
 Liese, F., & Vajda, I. (2006) On Divergences and Informations in Statistics and Information Theory. IEEE Transactions on Information Theory, 52(10), 4394–4412. DOI.
 LiPZ08
 Lizier, J. T., Prokopenko, M., & Zomaya, A. Y.(2008) A framework for the local information dynamics of distributed computation in complex systems.
 MaSh94
 Marton, K., & Shields, P. C.(1994) Entropy and the consistent estimation of joint distributions. The Annals of Probability, 22(2), 960–977.
 MoRL95
 Moon, Y. I., Rajagopalan, B., & Lall, U. (1995) Estimation of mutual information using kernel density estimators. Physical Review E, 52, 2318–2321. DOI.
 NeBR04
 Nemenman, I., Bialek, W., & de Ruyter Van Steveninck, R. (2004) Entropy and information in neural spike trains: Progress on the sampling problem. Physical Review E, 69(5), 56111.
 NeSB02
 Nemenman, I., Shafee, F., & Bialek, W. (2002) Entropy and inference, revisited. In Advances in Neural Information Processing Systems 14 (Vol. 14). Cambridge, MA, USA: The MIT Press
 Pani03
 Paninski, L. (2003) Estimation of entropy and mutual information. Neural Computation, 15(6), 1191–1253. DOI.
 PSMP07
 Panzeri, S., Senatore, R., Montemurro, M. A., & Petersen, R. S.(2007) Correcting for the sampling bias problem in spike train information measures. Journal of Neurophysiology, 98, 1064–1072. DOI.
 PaTr96
 Panzeri, S., & Treves, A. (1996) Analytical estimates of limited sampling biases in different information measures. Network: Computation in Neural Systems, 7(1), 87–107.
 Robi91
 Robinson, P. M.(1991) Consistent Nonparametric EntropyBased Testing. The Review of Economic Studies, 58(3), 437. DOI.
 Roul99
 Roulston, M. S.(1999) Estimating the errors on measured entropy and mutual information. Physica D: Nonlinear Phenomena, 125(3–4), 285–294. DOI.
 Schü15
 Schürmann, T. (2015) A Note on Entropy Estimation. Neural Computation, 27(10), 2097–2106. DOI.
 StLe08
 Staniek, M., & Lehnertz, K. (2008) Symbolic transfer entropy. Physical Review Letters, 100(15), 158101. DOI.
 VePa08
 Vejmelka, M., & Paluš, M. (2008) Inferring the directionality of coupling with conditional mutual information. Phys. Rev. E, 77(2), 26214. DOI.
 Vict02
 Victor, J. D.(2002) Binless strategies for estimation of information from neural data. Physical Review E, 66, 51903. DOI.
 WoWo94a
 Wolf, D. R., & Wolpert, D. H.(1994a) Estimating Functions of Distributions from A Finite Set of Samples, Part 2: Bayes Estimators for Mutual Information, ChiSquared, Covariance and other Statistics. arXiv:compgas/9403002.
 WoWo94b
 Wolpert, D. H., & Wolf, D. R.(1994b) Estimating Functions of Probability Distributions from a Finite Set of Samples, Part 1: Bayes Estimators and the Shannon Entropy. arXiv:compgas/9403001.
 WuYa14
 Wu, Y., & Yang, P. (2014) Minimax rates of entropy estimation on large alphabets via best polynomial approximation. arXiv:1407.0381 [Cs, Math, Stat].
See original: Statistical estimation of Information and other fiddly functionals
Content aggregators
Wed, 29/06/2016  1:17am  by dan mackinlayUpon the efficient consumption and summarizing of news from around the world.
I have been told to do this through twitter or facebook, but, seriously… no.
Those are systems designed to waste time with stupid distractions to benefit someone else.
Contrarily, I would like to find ways to summarise and condense information to save time for myself.
Feed readers
The classic.
You know what podcasts are?
Podcasts are a type of feed. An audio feed.
If I care about news articles and tumblr posts and whatever, not just audio, then I use feeds, feeds of text instead of audio. Any website can have a feed. Many do.
So…
Aside:
Remember when we thought the web would be a useful tool for researching and learning, and that automated research assistants would trawl the web for us?
RSS Feeds were often discussed as piece of that machine.
Little updates dripped from the web, to be sliced, diced, prioritised and analysed by our software to keep us aware of… whatever.
Most feed readers don’t do any of that fancy analysis though,
they just give you a list of new items ordered by date.
Still, whatever. Better than nothing.

commercial offerings
 feedly is the current boss. Targets commercial uses, like web “community managers” or marketing types. Probably works for humans too. This is how you would subscribe to my site in Feedly
 newsblur is a quirky little option that I happen to use currently. The interface defies the last 10 years of user interface conventions, which is confusing, but it works and is cheap. This is how you would subscribe to my site in Newsblur
 Feeder is a browser extension that reads feeds.
 The old reader reads feeds and this includes activity updates for people you follow on social media. Not sure if that is the worst or best of all worlds.

Indiestyle
I will run a server if the application is good enough, but it has to be worth the time investment. Let’s say between backups, security issues, confusing DNS failures etc, that’s 8 hours per year of miscellaneous computer wrangling, best case, and more hours if you have complicated things like some multiuser database like MySQL. Very few things are good enough to be worth the opportunity cost of that time.
Why people insist on running enterprise databases to hold a reading list is an ongoing mystery to me. The capacity to scale to many users is nice, I suppose, but by that logic everyone should drive everywhere in a school bus. miniflux is opensource, but also offers a hosted version for $15/year.
 stringer looks like a nice little ruby app but need postgresql. Bloat!
 tinytinyrss is the original “minimalist” RSS reader; it still need more databases than is sensible.
 fever is a weird commercial ($30) application that you host on your own server. It claims to learn your information preferences, negating my previous complaint. But I cannot be arsed installing some databasewanting app with suspiciously machinelearninginappropriate language requirements (PHP3) that also costs money to try, so I will never know.
See original: Content aggregators
Practical workshop in magnetite nanoparticles preparation
Tue, 28/06/2016  9:03am  by Wesam Ahmed TawfikNaqaa Nanotechnology Network is organizing a Practical Workshop in the Magnetite nanoparticles preparation for one day from
10:30 am till 3:30pm on Saturday 16 July 2016 which will contains lectures about different applications of magnetite nanoparticles and practical preparation of
Magnetite Nanoparticles
Important: Don't forget to get your lab coat with you for the practical part
Fees are 200 EGP
Spaces will be limited to 12 participants, so we ask attendees to register ahead of time
Fees include: Lectures on CD+ Practical part + lunch break+ Certificate.
Certificates will be accredited by NNN
For more information please call 01098915757, 01115831621
Those who would like to register:
Just send us an email at naqaafoundation@gmail.com containing:
1 Your full triple name as you want in Certificate
2 Your position
3Your mobile
4your email
Subject of email:Practical Workshop i
email message: I want to attend
Best regards
Practical workshop in magnetite nanoparticles preparation
Tue, 28/06/2016  9:00am  by Wesam Ahmed TawfikNaqaa Nanotechnology Network is organizing a Practical Workshop in the Magnetite nanoparticles preparation for one day from
10:30 am till 3:30pm on Saturday 16 July 2016 which will contains lectures about different applications of magnetite nanoparticles and practical preparation of
Magnetite Nanoparticles
Important: Don't forget to get your lab coat with you for the practical part
Fees are 200 EGP
Spaces will be limited to 12 participants, so we ask attendees to register ahead of time
Fees include: Lectures on CD+ Practical part + lunch break+ Certificate.
Certificates will be accredited by NNN
For more information please call 01098915757, 01115831621
Those who would like to register:
Just send us an email at naqaafoundation@gmail.com containing:
1 Your full triple name as you want in Certificate
2 Your position
3Your mobile
4your email
Subject of email:Practical Workshop i
email message: I want to attend
Best regards
Composition, music theory, mostly Western.
Mon, 27/06/2016  2:54am  by dan mackinlaySometime you don’t want to generate a chord, or measure a chord, or
learn a chord,
you just want to write a chord.
Helpful software for the musically vexed
 Fabrizio Poce’s
J74 progressive and J74 bassline
are some chord progression
generators from his library of very clever chord generators linked in to
Ableton Live’s scripting engine,
so if you
are using Ableton they might be very handy.
They are cheap (EUR12 + EUR15).
I use them myself, but they DO make Ableton crash a wee bit, so not really
suited for live performance, which is a pity because that would be a
wonderful unique selling point.
The realtimeoriented J74 HarmoTools from the same guy
are less sophisticated but worth trying, especially since they are free, and
he has lot of other clever hacks there too.
Basically, just go to this guy’s
site and try his stuff out. You don’t have to stop there.  Odesi
(USD49) has been doing lots of advertising and has a very nice popinterface.
It’s like Synfirelite with a library of pop tricks and rhythms.
The desktop version tries to install gigabytes of synths of meagre merit on your machine,
which is a giant waste of space an time if you are using a computer with synths on,
which you are because this is 2016.  Helio is free and cross platform and totally worth a shot.
There is a chord model in there and version control (!) but you might not notice the chord thing if you aren’t careful  Mixtikl / Noatikl are grandaddy apps for this, although the creators doubtless put much effort into the sleek user interfaces, their complete inability to explain their app or provide compelling demonstrations or use cases leave me cold.
I get the feeling they had highart aspirations but have ended up basically doing ambient noodles in order to sell product; Maybe I’m not being fair. (USD25/USD40)  Rapid Compose (USD99/USD249) might make decent software, but can’t really explain why their app is nice or provide a demo version.
 synfire explains how it uses music theory to do largescale scoring etc. Get the string section to behave itself or you’ll replace them with MIDIbots. (EUR996, so I won’t be buying it, but great demo video.)
 harmony builder does classical music theory for you.
USD39USD219 depending on heinously complex pricing schemes.
Will pass your conservatorium finals.  You can’t resist rolling your own?
sharp11 is a node.js music theory library for javascript with demo application to create jazz improv.  Supercollider of course does this and everything else, but designing user interfaces for it will take years off your life. OTOH, if you are happy with text, this might be a goer.
Arpeggiators
 Bluearp vst does 2note chord extrapolation (free)
 Hypercyclic is an LFOable arpeggiator (free)
 kirnu (free) and kirnu cream
 Polyrhythmus
Constraint Composition
All of that too mainstream? Try a weird alternative formalism!
How about constrain composition? That’s
declarative musical composition by defining constraints which the notes must satisfy.
Sounds fun in the abstract but the details don’t grab me somehow.
The reference here is strasheela built on an obscure, unpopular, and apparently discontinued Prologlike language called “Oz” or “Mozart”, because using existing languages is not a grand a gesture as claiming none of them are quire Turing complete enough for your special thingy.
That is a bit of a ghost town;
If you wanted to actually do this, you’d probably use overtone + minikanren (prologforlisp) to do this, as with
the composing schemer,
or to be even more mainstream, just use a normal constraint solver in a normal language.
I am fond of python and ncvx.
Anyway, prolog fans read on.
 Anders, T., & Miranda, E. R.(2008). HigherOrder Constraint Applicators for Music Constraint Programming. In Proceedings of the 2008 International Computer Music Conference. Belfast, UK.
 Anders, T., & Miranda, E. R.(2010). Constraint Application with HigherOrder Programming for Modeling Music Theories. Computer Music Journal, 34(2), 25–38. DOI. Online.
 Anders, T., & Miranda, E. R.(2011). Constraint programming systems for modeling music theories and composition. ACM Computing Surveys, 43(4), 1–38. DOI. Online.
 Anders, T., & Miranda, E. R.(2009). A computational model that generalises Schoenberg’s guidelines for favourable chord progressions. In Proceedings of the Sound and Music Computing Conference. Citeseer. Online.
See original: Composition, music theory, mostly Western.
Gaussian distribution and Erf and Normality
Mon, 27/06/2016  2:52am  by dan mackinlayStunts with Gaussian distributions.
Let’s start here with the basic thing.
The (univariate) standard Gaussian pdf
\psi:x\mapsto \frac{1}{sqrt{2\pi}}\text{exp}\left(\frac{x^2}{2}\right)
\end{equation*}
We define
.. math:
\Psi:x\mapsto \int_{\infty}^x\psi{t} dt
This erf function is popular, isn’t it?
Unavoidable if you do computer algebra.
But I can never remember what it is.
There’s this scaling factor tacked on.
Well…
\operatorname{erf}(x)\; =\; \frac{1}{\sqrt{\pi}} \int_{x}^x e^{t^2} \, dt
\end{equation*}
\sqrt{\frac{\pi }{2}} \left(\text{erf}\left(\frac{x}{\sqrt{2}}\right)+1\right)
\end{equation*}
Differential representation
Nonlinear univariate DE represention.
\begin{align*}
\sigma ^2 f'(x)+f(x) (x\mu )&=0\\
f(0) &=\frac{e^{\mu ^2/(2\sigma ^2)}}{\sqrt{2 \sigma^2\pi } }\\
L(x) &=(\sigma^2 D+x\mu)
\end{align*}
\end{equation*}
Linear PDE representation as a diffusion equation (see, e.g. BoGK10)
\begin{align*}
\frac{\partial}{\partial t)f(x;t) &=\frac{1}{2}\frac{\partial^2}{\partial x^2}f(x;t)\\
f(x;0)&=\delta(x\mu)
\end{align*}
\end{equation*}
Look, it’s the diffusion equation of Wiener process.
Roughness
\begin{align*}
\ \frac{d}{dx}\phi_\sigma \_2 &= \frac{1}{4\sqrt{\pi}\simga^3}\\
\ \left(\frac{d}{dx}\right)^n \phi_\sigma \_2 &= \frac{\prod_{i<n}2n1}{2^{n+1}\sqrt{\pi}\simga^{2n+1}}
\end{align*}
\end{equation*}
Refs
 Bote16
 Botev, Z. I.(2016) The Normal Law Under Linear Restrictions: Simulation and Estimation via Minimax Tilting. Journal of the Royal Statistical Society: Series B (Statistical Methodology), n/an/a. DOI.
 BoGK10
 Botev, Z. I., Grotowski, J. F., & Kroese, D. P.(2010) Kernel density estimation via diffusion. The Annals of Statistics, 38(5), 2916–2957. DOI.
See original: Gaussian distribution and Erf and Normality
Sparse regression and things that look a bit like it.
Thu, 23/06/2016  6:50am  by dan mackinlayRelated to compressed sensing but here we consider sampling complexity and the effect of measurement noise.
See also matrix factorisations,
optimisation,
model selection,
multiple testing,
concentration inequalities,
sparse flavoured icecream.
To discuss:
LARS, LASSO, debiassed LASSO, Elastic net, etc.
Implementations
I’m not going to mention LASSO in (generalised) linear regression,
since everything does that these days (Oh alright,
Jerome Friedman’s glmnet for R is the fastest,
and has a MATLAB version.
But SPAMS (C++, MATLAB, R, python) by Mairal himself, looks interesting.
It’s an optimisation library for many various in sparse problems.
See original: Sparse regression and things that look a bit like it.
Eating Japanese Knotweed (and other daft ideas)
Wed, 22/06/2016  5:38pm  by sciencewriterIRImage: Wikipedia 
There have been a number of calls(1,2,3,4) in recent weeks and months to control the invasive plant Japanese Knotweed, at least partially, by eating it. In recent days, Kerry County Council in Ireland heard from one member who, albeit with tongueincheek, urged citizens to make wine, jelly and other sweet treats from the plant.
This strikes me as a terrible idea.
The plant itself is certainly edible  the Japanese have been eating it for years. It's Japanese name, itadori, means 'well being' and it seems to have some medicinal properties. It also tastes a bit like rhubarb apparently. I wouldn't know, I haven't tried it.
I haven't tried it for the same reason I don't advise you try it. Encouraging people to harvest and transport a regulated, invasive species is the perfect recipe (if you'll pardon the pun) for its continued and accelerated spread.
Japanese Knotweed (Fallopia japonica) is, as you will have guessed, native to Japan and the neighbouring region. It was introduced to the UK in the mid19th century and quickly spread to Ireland and other parts of the world. Introduced as an ornamental plant, it quickly became a real problem.
The plant is capable of growing at a tremendous rate  1 metre in a month and forms big stands 23 metres in height. The early shoots are spear like, similar to asparagus in appearance and the plants produce delicate white flowers in late Summer. The real problem is underground where the plant forms tough rhizomes, adapted rootlike organs, which remain in the soil even during the Winter when the rest of the plant dies back.
Japanese Knotweed thrives on disturbance and it is mainly spread by fragments of rhizome, crown or stem being accidentally or deliberately moved. This leads to some real (and expensive) problems including a massive reduction in biodiversity under the alien canopy; structural damage to buildings and infrastructure; and the significant cost of its removal.
Data from 2010 suggest that the plant costs the UK £165 million a year to control. If the plant were to be eradicated in the UK by current methods it would cost £1.56 billion. For one site alone, the 2012 London Olympic site, it cost £88 million to deal with this one invasive plant. Nobody wants Japanese Knotweed on their land.
Image: Wikipedia 
Imagine you go to the supermarket and buy a bunch of rhubarb. The first thing you do is chop the top and bottom off the stalks and chuck them on your compost heap. Do this with Japanese Knotweed and you end up costing yourself (and potentially your neighbours) thousands in a cleanup bill.
Harvesting Japanese Knotweed from the wild, no matter how careful you are, is also fraught with problems. The plant can easily regrow from small fragments the size of your fingernail. If we're lucky, you'll drop these fragments at the original, infested site. If not, you'll drop them on your walk back to the car or in your front garden when you unload the car.
Simply put, encouraging people to mess around with an invasive species like Japanese Knotweed is, in my view, irresponsible. It may also be illegal.
In Ireland, it is an offence to "plant, disperse or cause to disperse or otherwise cause to grow" the plant. It is also an offence if "he/she has in his/her possession for sale or for breeding/reproduction/transport....anything from which the plant can be reproduced or propagated".
In the meantime, there are chemical and physical control options and scientists in the UK are developing a biological control approach using a sapsucking insect called Aphalara itadori. This is an old enemy of the plant, found in Japan and currently being tested in the UK to see if it will do the same job in this part of the world (and not eat anything else, by accident). The trials haven't been a total success with numbers surviving over winter too low to have much of an effect, but the tests are ongoing. Hopefully, before too long we will have a sustainable control option for this invasive plant. In the meantime, stop eating it.
See original: Eating Japanese Knotweed (and other daft ideas)
Smoothing, regularisation, penalization and friends
Tue, 21/06/2016  10:04am  by dan mackinlayIn nonparametric statistics we might estimate simultaneously what look like
many, many parameters, which we constrain in some clever fashion,
which usually boils down to something we can interpret as a “smoothing”
parameters, controlling how many parameters we still have to model
from a subset of the original.
The “regularisation” nomenclature claims descent from Tikhonov, (eg TiGl65 etc) who wanted to solve illconditioned integral and differential equations, so it’s slightly more general.
“Smoothing” seems to be common in the
spline and
kernel estimate communities of
Wahba (Wahb90) and Silverman (Silv84) et al,
who usually actually want to smooth curves.
“Penalization” has a geneology unknown to me, but is probably the least abstruse for common usage.
These are, AFAICT, more or less the same thing.
“smoothing” is more common in my communities which is fine,
but we have to remember that “smoothing” an estimator might not always infer smooth dynamics in the estimand;
it could be something else being smoothed, such as variance in the estimate of parameters of a rough function.
In every case, you wish to solve an illconditioned inverse problem, so you tame it by adding a penalty to solutions you feel one should be reluctant to accept.
TODO: make comprehensible
TODO: examples
TODO: discuss connection with model selection
TODO: discuss connection with compressed sensing.
The real classic approach here is spline smoothing of functional data.
More recent approaches are things like sparse regression.
Refs
 Bach00
 Bach, F. (n.d.) ModelConsistent Sparse Estimation through the Bootstrap.
 ChHS15
 Chernozhukov, V., Hansen, C., & Spindler, M. (2015) Valid PostSelection and PostRegularization Inference: An Elementary, General Approach. Annual Review of Economics, 7(1), 649–688. DOI.
 EHJT04
 Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004) Least angle regression. The Annals of Statistics, 32(2), 407–499. DOI.
 FlHS13
 Flynn, C. J., Hurvich, C. M., & Simonoff, J. S.(2013) Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models. arXiv:1302.2068 [Stat].
 FrHT10
 Friedman, J., Hastie, T., & Tibshirani, R. (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. DOI.
 JaFH13
 Janson, L., Fithian, W., & Hastie, T. (2013) Effective Degrees of Freedom: A Flawed Metaphor. arXiv:1312.7851 [Stat].
 KaRo14
 Kaufman, S., & Rosset, S. (2014) When does more regularization imply fewer degrees of freedom? Sufficient conditions and counterexamples. Biometrika, 101(4), 771–784. DOI.
 KoMi06
 Koenker, R., & Mizera, I. (2006) Density estimation by total variation regularization. Advances in Statistical Modeling and Inference, 613–634.
 LiRW10
 Liu, H., Roeder, K., & Wasserman, L. (2010) Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. In J. D. Lafferty, C. K. I. Williams, J. ShaweTaylor, R. S. Zemel, & A. Culotta (Eds.), Advances in Neural Information Processing Systems 23 (pp. 1432–1440). Curran Associates, Inc.
 MeBü10
 Meinshausen, N., & Bühlmann, P. (2010) Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473. DOI.
 Meye08
 Meyer, M. C.(2008) Inference using shaperestricted regression splines. The Annals of Applied Statistics, 2(3), 1013–1033. DOI.
 Silv84
 Silverman, B. W.(1984) Spline Smoothing: The Equivalent Variable Kernel Method. The Annals of Statistics, 12(3), 898–916. DOI.
 SmSM98
 Smola, A. J., Schölkopf, B., & Müller, K.R. (1998) The connection between regularization operators and support vector kernels. Neural Networks, 11(4), 637–649. DOI.
 TKPS14
 Tansey, W., Koyejo, O., Poldrack, R. A., & Scott, J. G.(2014) False discovery rate smoothing. arXiv:1411.6144 [Stat].
 TiGl65
 Tikhonov, A. N., & Glasko, V. B.(1965) Use of the regularization method in nonlinear problems. USSR Computational Mathematics and Mathematical Physics, 5(3), 93–107. DOI.
 Geer14
 van de Geer, S. (2014) Weakly decomposable regularization penalties and structured sparsity. Scandinavian Journal of Statistics, 41(1), 72–86. DOI.
 Wahb90
 Wahba, G. (1990) Spline Models for Observational Data. . SIAM
 WeMZ16
 Weng, H., Maleki, A., & Zheng, L. (2016) Overcoming The Limitations of Phase Transition by Higher Order Analysis of Regularization Techniques. arXiv:1603.07377 [Cs, Math, Stat].
 Wood00
 Wood, S. N.(2000) Modelling and smoothing parameter estimation with multiple quadratic penalties. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(2), 413–428. DOI.
 Wood08
 Wood, S. N.(2008) Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3), 495–518. DOI.
 ZoHa05
 Zou, H., & Hastie, T. (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. DOI.
 ZoHT07
 Zou, H., Hastie, T., & Tibshirani, R. (2007) On the “degrees of freedom” of the lasso. The Annals of Statistics, 35(5), 2173–2192. DOI.
See original: Smoothing, regularisation, penalization and friends
DJing
Tue, 21/06/2016  4:09am  by dan mackinlayYet our sounds are also a vocabulary for those who detest the walledoff concentrations of wealth, and steal property back: the collectives that build their own sound systems, stage free parties, and invite DJs to perform. The international DJ becomes emblematic of global capitalism’s complicated cultural dimension. On flights and at the free Continental breakfasts in hotels, often the same souldestroying hotel chains in each city, we get stuck chatting with our fellow Americans and Western Europeans, the executives eager to find compatriots. We make small talk with these consultants and dealmakers in the descending elevators in the evening—then go out to the city’s deadend and unowned spaces or its luxury venues to soundtrack the night of the region’s youth, hungry for something new. DJ music is now the common art form of squatters and the nouveau riche; it is the soundtrack both for capital and for its opposition.
http://www.ibrahimshaath.co.uk/keyfinder/
tangerine echonest
see also machine listening,
audio software
DJing software
So many choices, now. I use Ableton, but Traktor and Serrato are more designed for this.
Open source/ lower cost alternatives?
 flow8deck is made by the people who made mixedinkey, software for the musically vexed. It handles keychanges good.
 Traktor
 Serrato
 Djay
See original: DJing
Prepping
Sun, 19/06/2016  8:43am  by dan mackinlaySurviving the collapse of civilisation
Kickstarter for a New Civilization
https://emergentbydesign.com/2015/01/14/kickstarterforanewcivilization/
https://medium.com/emergentculture/reinventeverything556860b63308#.7cttqqium
See original: Prepping
Moving the poors to marginal electorate
Fri, 17/06/2016  6:32am  by dan mackinlayOK, Let’s start treating politics like the favour machine it is and behave accordingly;
NSW under Mike baird is a system wherew you buy favours with leverage.
I’d like it to be otherwise, buyt let’s look
Optimal marginalness.
Invade marginal electorates
Oerganised opposition menas we are more likely to claim council seats as a side benefit.
See original: Moving the poors to marginal electorate
Recurrent neural networks
Fri, 17/06/2016  5:21am  by dan mackinlayFeedback neural networks structured to have memory and a notion of “current” and “past” states, which can encode time (or whatever).
As someone who does a lot of signal processing for music, the notion that these generalise linear systems theory is suggestive of lots of interesting DSP applications.
The connection between these (IIR) and “convolutional” (FIR) neural networks is suggestive for the same reason.
 Awesome RNN is a curated links list of implementations.
 Andrej Karpathy: The unreasonable effectiveness of RNN
 Christopher Olah: Understanding LTSM RNNs
 Jeff Donahue Long term recurrent NN
 Ross Gibson Adventures in narrated reality gives an overview of text generation using RNNs
Flavours
Vanilla
The main problem here is that they are unstable in the training phase unless you are clever.
See BeSF94. One solution is LSTM; see next.
Long Short Term Memory (LSTM)
As always, Christopher Olah wins the visual explanation prize:
Understanding LSTM Networks
LSTM Networks for Sentiment Analysis:
In a traditional recurrent neural network, during the gradient backpropagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of timesteps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.[…]
These issues are the main motivation behind the LSTM model which introduces a new structure called a memory cell…]. A memory cell is composed of four main elements: an input gate, a neuron with a selfrecurrent connection (a connection to itself), a forget gate and an output gate. […]The gates serve to modulate the interactions between the memory cell itself and its environment.
GridRNN
A minigenre.
KaDG15 et al connect recurrent cells across multiple axes, leading to a higherrank MIMO system;
This is natural in many kinds of spatial random fields, and I am amazed it was uncommon enough to need formalizing in a paper; but it was and it did and good on Kalchbrenner et al.
Gate Recurrent Unit (GRU)
TBD
Liquid/ Echo State Machines
This sounds deliciously lazy;
Very roughly speaking, your first layer is a reservoir of random saturating IIR filters.
You fit a classifier on the outputs of this.
Easy to implement, that.
I wonder when it actually works, constraints on topology etc.
I wonder if you can use some kind of sparsifying transform on the recurrence operator?
These claim to be based on spiky models, but AFAICT this is not at all necessary.
Various claims are made about how hard they avoid the training difficulty of similarly basic RNNs by being essentially untrained; you use them as a feature factory for another supervised output algorithm.
Suggestive parallel with random projections.
From a dynamical systems perspective, there are two main classes of RNNs.
Models from the first class are characterized by an energyminimizing
stochastic dynamics and symmetric connections.
The best known instantiations are Hopfield networks, Boltzmann machines, and
the recently emerging Deep Belief Networks.
These networks are mostly trained in some unsupervised learning scheme.
Typical targeted network functionalities in this field are associative
memories, data compression, the unsupervised modeling of data distributions,
and static pattern classification, where the model is run for multiple time
steps per single input instance to reach some type of convergence or
equilibrium
(but see e.g., TaHR06 for extension to temporal data).
The mathematical background is rooted in statistical physics.
In contrast, the second big class of RNN models typically features a
deterministic update dynamics and directed connections.
Systems from this class implement nonlinear filters, which
transform an input time series into an output time series.
The mathematical background here is nonlinear dynamical systems.
The standard training mode is supervised.
This survey is concerned only with RNNs of this second type, and
when we speak of RNNs later on, we will exclusively refer to such systems.
Other
It’s still the wild west. Invent a category, name it and stake a claim.
Practicalities
Variable sequence length:
https://gist.github.com/evanthebouncy/8e16148687e807a46e3f
Danijar Hafner:
* Introduction to Recurrent Networks in TensorFlow
* https://danijar.com/variablesequencelengthsintensorflow/
seq2seq models with GRUs : Fun with Recurrent Neural Nets: One More Dive into CNTK and TensorFlow
Refs
 AuBM08
 Auer, P., Burgsteiner, H., & Maass, W. (2008) A learning rule for very simple universal approximators consisting of a single layer of perceptrons. Neural Networks, 21(5), 786–795. DOI.
 BeSF94
 Bengio, Y., Simard, P., & Frasconi, P. (1994) Learning longterm dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. DOI.
 BoBV12
 BoulangerLewandowski, N., Bengio, Y., & Vincent, P. (2012) Modeling Temporal Dependencies in HighDimensional Sequences: Application to Polyphonic Music Generation and Transcription. In 29th International Conference on Machine Learning.
 BoLe06
 Bown, O., & Lexer, S. (2006) ContinuousTime Recurrent Neural Networks for Generative and Interactive Musical Performance. In F. Rothlauf, J. Branke, S. Cagnoni, E. Costa, C. Cotta, R. Drechsler, … H. Takagi (Eds.), Applications of Evolutionary Computing (pp. 652–663). Springer Berlin Heidelberg
 BuMe05
 Buhusi, C. V., & Meck, W. H.(2005) What makes us tick? Functional and neural mechanisms of interval timing. Nature Reviews Neuroscience, 6(10), 755–765. DOI.
 CMBB14
 Cho, K., van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014) On the properties of neural machine translation: Encoderdecoder approaches. arXiv Preprint arXiv:1409.1259.
 CGCB14
 Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014) Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In NIPS.
 CGCB15
 Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2015) Gated Feedback Recurrent Neural Networks. arXiv:1502.02367 [Cs, Stat].
 DoPo15
 Doelling, K. B., & Poeppel, D. (2015) Cortical entrainment to music and its modulation by expertise. Proceedings of the National Academy of Sciences, 112(45), E6233–E6242. DOI.
 DuPW14
 Duan, Q., Park, J. H., & Wu, Z.G. (2014) Exponential state estimator design for discretetime neural networks with discrete and distributed timevarying delays. Complexity, 20(1), 38–48. DOI.
 Gal15
 Gal, Y. (2015) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. arXiv:1512.05287 [Stat].
 GeSC00
 Gers, F. A., Schmidhuber, J., & Cummins, F. (2000) Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10), 2451–2471. DOI.
 GDGR15
 Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) DRAW: A Recurrent Neural Network For Image Generation. arXiv:1502.04623 [Cs].
 GCWK09
 Grzyb, B. J., Chinellato, E., Wojcik, G. M., & Kaminski, W. A.(2009) Which model to use for the Liquid State Machine?. In 2009 International Joint Conference on Neural Networks (pp. 1018–1024). DOI.
 HaMa12
 Hazan, H., & Manevitz, L. M.(2012) Topological constraints and robustness in liquid state machines. Expert Systems with Applications, 39(2), 1597–1606. DOI.
 HDYD12
 Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., … Kingsbury, B. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. DOI.
 HoSc97
 Hochreiter, S., & Schmidhuber, J. (1997) Long ShortTerm Memory. Neural Computation, 9(8), 1735–1780. DOI.
 JoZS15
 Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015) An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning (ICML15) (pp. 2342–2350).
 KaDG15
 Kalchbrenner, N., Danihelka, I., & Graves, A. (2015) Grid Long ShortTerm Memory. arXiv:1507.01526 [Cs].
 KaJF15
 Karpathy, A., Johnson, J., & FeiFei, L. (2015) Visualizing and Understanding Recurrent Networks. arXiv:1506.02078 [Cs].
 Lecu98
 LeCun, Y. (1998) Gradientbased learning applied to document recognition. Proc. IEEE, 86(11), 2278–2324. DOI.
 LeNM05
 Legenstein, R., Naeger, C., & Maass, W. (2005) What Can a Neuron Learn with SpikeTimingDependent Plasticity?. Neural Computation, 17(11), 2337–2382. DOI.
 LiBE15
 Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015) A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv:1506.00019 [Cs].
 LuJa09
 Lukoševičius, M., & Jaeger, H. (2009) Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3), 127–149. DOI.
 MaNM04
 Maass, W., Natschläger, T., & Markram, H. (2004) Computational Models for Generic Cortical Microcircuits. In Computational Neuroscience: A Comprehensive Approach (pp. 575–605). Chapman & Hall/CRC
 Mico15
 Miconi, T. (2015) Training recurrent neural networks with sparse, delayed rewards for flexible decision tasks. arXiv:1507.08973 [QBio].
 Mnih15
 Mnih, V. (2015) Humanlevel control through deep reinforcement learning. Nature, 518, 529–533. DOI.
 MoDH12
 Mohamed, A. r, Dahl, G. E., & Hinton, G. (2012) Acoustic Modeling Using Deep Belief Networks. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 14–22. DOI.
 OIMT15
 Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E. H., & Freeman, W. T.(2015) Visually Indicated Sounds. arXiv:1512.08512 [Cs].
 RoRS15
 Rohrbach, A., Rohrbach, M., & Schiele, B. (2015) The LongShort Story of Movie Description. arXiv:1506.01698 [Cs].
 Schw07
 Schwenk, H. (2007) Continuous space language models. Computer Speech Lang., 21, 492–518. DOI.
 TaHR06
 Taylor, G. W., Hinton, G. E., & Roweis, S. T.(2006) Modeling human motion using binary latent variables. In Advances in neural information processing systems (pp. 1345–1352).
 ThBe15
 Theis, L., & Bethge, M. (2015) Generative Image Modeling Using Spatial LSTMs. arXiv:1506.03478 [Cs, Stat].
 VKCM15
 Visin, F., Kastner, K., Cho, K., Matteucci, M., Courville, A., & Bengio, Y. (2015) ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv:1505.00393 [Cs].
 Waib89
 Waibel, A. (1989) Phoneme recognition using timedelay neural networks. IEEE Trans. Acoustics Speech Signal Process., 37(3), 328–339. DOI.
 YTCB15
 Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., & Courville, A. (2015) Describing Videos by Exploiting Temporal Structure. arXiv:1502.08029 [Cs, Stat].
See original: Recurrent neural networks
Generalised linear models
Wed, 15/06/2016  4:38am  by dan mackinlayUsing the machinery of linear regression to predict in
somewhat more general regressions.
This means you are still doing Maximum Likelihood regression,
but outside the setting of homoskedastic gaussian noise and linear response.
Not quite as fancy as generalised additive models,
but if you have to implement such models yourself,
less work. If you are using R this is not you.
To learn:
 When we can do this? e.g. Must the response be from an exponential family for really real? Wikipedia mentions the “overdispersed exponential family” which is no such thing.
 Does anything funky happen with regularisation?
 Whether to merge this in with quasilikelihood.
 Fitting variance parameters.
Pieces of the method follow.
Response distribution
TBD. What constraints do we have here
Linear Predictor
Link function
An invertible (monotonic?) function
relating the mean of the linear predictor and
the mean of the response distribution.
Refs
 BuHT89
 Buja, A., Hastie, T., & Tibshirani, R. (1989) Linear Smoothers and Additive Models. The Annals of Statistics, 17(2), 453–510.
 CuDE06
 Currie, I. D., Durban, M., & Eilers, P. H. C.(2006) Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(2), 259–280. DOI.
 FrHT10
 Friedman, J., Hastie, T., & Tibshirani, R. (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. DOI.
 Hans10
 Hansen, N. R.(2010) Penalized maximum likelihood estimation for generalized linear point processes. arXiv:1003.0848 [Math, Stat].
 Hoss09
 Hosseinian, Sahar. (2009) Robust inference for generalized linear models: binary and poisson regression. . ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
 LeNP06
 Lee, Y., Nelder, J. A., & Pawitan, Y. (2006) Generalized linear models with random effects. . Boca Raton, FL: Chapman & Hall/CRC
 Mccu84
 McCullagh, P. (1984) Generalized linear models. European Journal of Operational Research, 16(3), 285–292. DOI.
 NeBa04
 Nelder, J. A., & Baker, R. J.(2004) Generalized Linear Models. In Encyclopedia of Statistical Sciences. John Wiley & Sons, Inc.
 NeWe72
 Nelder, J. A., & Wedderburn, R. W. M.(1972) Generalized Linear Models. Journal of the Royal Statistical Society. Series A (General), 135(3), 370–384. DOI.
 PrLu13
 Proietti, T., & Luati, A. (2013) Generalised Linear Spectral Models (CEIS Research Paper No. 290). . Tor Vergata University, CEIS
 Wedd74
 Wedderburn, R. W. M.(1974) Quasilikelihood functions, generalized linear models, and the Gauss—Newton method. Biometrika, 61(3), 439–447. DOI.
 Wedd76
 Wedderburn, R. W. M.(1976) On the existence and uniqueness of the maximum likelihood estimates for certain generalized linear models. Biometrika, 63(1), 27–32. DOI.
 Wood08
 Wood, S. N.(2008) Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3), 495–518. DOI.
 XiWJ14
 Xia, T., Wang, X.R., & Jiang, X.J. (2014) Asymptotic properties of maximum quasilikelihood estimator in quasilikelihood nonlinear models with misspecified variance function. Statistics, 48(4), 778–786. DOI.
See original: Generalised linear models