Composition, music theory, mostly Western.

Sometime you don’t want to generate a chord, or measure a chord, or
learn a chord,
you just want to write a chord.

Helpful software for the musically vexed

  • Fabrizio Poce’s
    J74 progressive and J74 bassline
    are some chord progression
    generators from his library of very clever chord generators linked in to
    Ableton Live’s scripting engine,
    so if you
    are using Ableton they might be very handy.
    They are cheap (EUR12 + EUR15).
    I use them myself, but they DO make Ableton crash a wee bit, so not really
    suited for live performance, which is a pity because that would be a
    wonderful unique selling point.
    The realtime-oriented J74 HarmoTools from the same guy
    are less sophisticated but worth trying, especially since they are free, and
    he has lot of other clever hacks there too.
    Basically, just go to this guy’s
    site and try his stuff out. You don’t have to stop there.
  • Odesi
    (USD49) has been doing lots of advertising and has a very nice pop-interface.
    It’s like Synfire-lite with a library of pop tricks and rhythms.
    The desktop version tries to install gigabytes of synths of meagre merit on your machine,
    which is a giant waste of space an time if you are using a computer with synths on,
    which you are because this is 2016.
  • Helio is free and cross platform and totally worth a shot.
    There is a chord model in there and version control (!) but you might not notice the chord thing if you aren’t careful
  • Mixtikl / Noatikl are grandaddy apps for this, although the creators doubtless put much effort into the sleek user interfaces, their complete inability to explain their app or provide compelling demonstrations or use cases leave me cold.
    I get the feeling they had high-art aspirations but have ended up basically doing ambient noodles in order to sell product; Maybe I’m not being fair. (USD25/USD40)
  • Rapid Compose (USD99/USD249) might make decent software, but can’t really explain why their app is nice or provide a demo version.
  • synfire explains how it uses music theory to do large-scale scoring etc. Get the string section to behave itself or you’ll replace them with MIDIbots. (EUR996, so I won’t be buying it, but great demo video.)
  • harmony builder does classical music theory for you.
    USD39-USD219 depending on heinously complex pricing schemes.
    Will pass your conservatorium finals.
  • You can’t resist rolling your own?
    sharp11 is a node.js music theory library for javascript with demo application to create jazz improv.
  • Supercollider of course does this and everything else, but designing user interfaces for it will take years off your life. OTOH, if you are happy with text, this might be a goer.

Arpeggiators

Constraint Composition

All of that too mainstream? Try a weird alternative formalism!
How about constrain composition? That’s
declarative musical composition by defining constraints which the notes must satisfy.
Sounds fun in the abstract but the details don’t grab me somehow.

The reference here is strasheela built on an obscure, unpopular, and apparently discontinued Prolog-like language called “Oz” or “Mozart”, because using existing languages is not a grand a gesture as claiming none of them are quire Turing complete enough for your special thingy.

That is a bit of a ghost town;
If you wanted to actually do this, you’d probably use overtone + minikanren (prolog-for-lisp) to do this, as with
the composing schemer,
or to be even more mainstream, just use a normal constraint solver in a normal language.
I am fond of python and ncvx.

Anyway, prolog fans read on.

  • Anders, T., & Miranda, E. R.(2008). Higher-Order Constraint Applicators for Music Constraint Programming. In Proceedings of the 2008 International Computer Music Conference. Belfast, UK.
  • Anders, T., & Miranda, E. R.(2010). Constraint Application with Higher-Order Programming for Modeling Music Theories. Computer Music Journal, 34(2), 25–38. DOI. Online.
  • Anders, T., & Miranda, E. R.(2011). Constraint programming systems for modeling music theories and composition. ACM Computing Surveys, 43(4), 1–38. DOI. Online.
  • Anders, T., & Miranda, E. R.(2009). A computational model that generalises Schoenberg’s guidelines for favourable chord progressions. In Proceedings of the Sound and Music Computing Conference. Citeseer. Online.

See original: The Living Thing / Notebooks Composition, music theory, mostly Western.

Gaussian distribution and Erf and Normality

Stunts with Gaussian distributions.

Let’s start here with the basic thing.
The (univariate) standard Gaussian pdf

\begin{equation*}
\psi:x\mapsto \frac{1}{sqrt{2\pi}}\text{exp}\left(-\frac{x^2}{2}\right)
\end{equation*}

We define
.. math:

\Psi:x\mapsto \int_{-\infty}^x\psi{t} dt

This erf function is popular, isn’t it?
Unavoidable if you do computer algebra.
But I can never remember what it is.
There’s this scaling factor tacked on.

Well…

\begin{equation*}
\operatorname{erf}(x)\; =\; \frac{1}{\sqrt{\pi}} \int_{-x}^x e^{-t^2} \, dt
\end{equation*}
\begin{equation*}
\sqrt{\frac{\pi }{2}} \left(\text{erf}\left(\frac{x}{\sqrt{2}}\right)+1\right)
\end{equation*}

Differential representation

Non-linear univariate DE represention.

\begin{equation*}
\begin{align*}
\sigma ^2 f'(x)+f(x) (x-\mu )&=0\\
f(0) &=\frac{e^{-\mu ^2/(2\sigma ^2)}}{\sqrt{2 \sigma^2\pi } }\\
L(x) &=(\sigma^2 D+x-\mu)
\end{align*}
\end{equation*}

Linear PDE representation as a diffusion equation (see, e.g. BoGK10)

\begin{equation*}
\begin{align*}
\frac{\partial}{\partial t)f(x;t) &=\frac{1}{2}\frac{\partial^2}{\partial x^2}f(x;t)\\
f(x;0)&=\delta(x-\mu)
\end{align*}
\end{equation*}

Look, it’s the diffusion equation of Wiener process.

Roughness

\begin{equation*}
\begin{align*}
\| \frac{d}{dx}\phi_\sigma \|_2 &= \frac{1}{4\sqrt{\pi}\simga^3}\\
\| \left(\frac{d}{dx}\right)^n \phi_\sigma \|_2 &= \frac{\prod_{i<n}2n-1}{2^{n+1}\sqrt{\pi}\simga^{2n+1}}
\end{align*}
\end{equation*}

Refs

Bote16
Botev, Z. I.(2016) The Normal Law Under Linear Restrictions: Simulation and Estimation via Minimax Tilting. Journal of the Royal Statistical Society: Series B (Statistical Methodology), n/a-n/a. DOI.
BoGK10
Botev, Z. I., Grotowski, J. F., & Kroese, D. P.(2010) Kernel density estimation via diffusion. The Annals of Statistics, 38(5), 2916–2957. DOI.

See original: The Living Thing / Notebooks Gaussian distribution and Erf and Normality

Sparse regression and things that look a bit like it.

Related to compressed sensing but here we consider sampling complexity and the effect of measurement noise.

See also matrix factorisations,
optimisation,
model selection,
multiple testing,
concentration inequalities,
sparse flavoured icecream.

To discuss:

LARS, LASSO, de-biassed LASSO, Elastic net, etc.

Implementations

I’m not going to mention LASSO in (generalised) linear regression,
since everything does that these days (Oh alright,
Jerome Friedman’s glmnet for R is the fastest,
and has a MATLAB version.

But SPAMS (C++, MATLAB, R, python) by Mairal himself, looks interesting.
It’s an optimisation library for many various in sparse problems.

See original: The Living Thing / Notebooks Sparse regression and things that look a bit like it.

Eating Japanese Knotweed (and other daft ideas)

Image: Wikipedia

There have been a number of calls(1,2,3,4) in recent weeks and months to control the invasive plant Japanese Knotweed, at least partially, by eating it. In recent days, Kerry County Council in Ireland heard from one member who, albeit with tongue-in-cheek, urged citizens to make wine, jelly and other sweet treats from the plant.

This strikes me as a terrible idea.

The plant itself is certainly edible - the Japanese have been eating it for years. It's Japanese name, itadori, means 'well being' and it seems to have some medicinal properties. It also tastes a bit like rhubarb apparently. I wouldn't know, I haven't tried it.

I haven't tried it for the same reason I don't advise you try it. Encouraging people to harvest and transport a regulated, invasive species is the perfect recipe (if you'll pardon the pun) for its continued and accelerated spread.

Japanese Knotweed (Fallopia japonica) is, as you will have guessed, native to Japan and the neighbouring region. It was introduced to the UK in the mid-19th century and quickly spread to Ireland and other parts of the world. Introduced as an ornamental plant, it quickly became a real problem.

The plant is capable of growing at a tremendous rate - 1 metre in a month- and forms big stands 2-3 metres in height. The early shoots are spear like, similar to asparagus in appearance and the plants produce delicate white flowers in late Summer. The real problem is underground where the plant forms tough rhizomes, adapted root-like organs, which remain in the soil even during the Winter when the rest of the plant dies back.

Japanese Knotweed thrives on disturbance and it is mainly spread by fragments of rhizome, crown or stem being accidentally or deliberately moved. This leads to some real (and expensive) problems including a massive reduction in biodiversity under the alien canopy; structural damage to buildings and infrastructure; and the significant cost of its removal.

Data from 2010 suggest that the plant costs the UK £165 million a year to control. If the plant were to be eradicated in the UK by current methods it would cost £1.56 billion. For one site alone, the 2012 London Olympic site, it cost £88 million to deal with this one invasive plant. Nobody wants Japanese Knotweed on their land.

Image: Wikipedia

Imagine you go to the supermarket and buy a bunch of rhubarb. The first thing you do is chop the top and bottom off the stalks and chuck them on your compost heap. Do this with Japanese Knotweed and you end up costing yourself (and potentially your neighbours) thousands in a cleanup bill.

Harvesting Japanese Knotweed from the wild, no matter how careful you are, is also fraught with problems. The plant can easily regrow from small fragments the size of your fingernail. If we're lucky, you'll drop these fragments at the original, infested site. If not, you'll drop them on your walk back to the car or in your front garden when you unload the car.

Simply put, encouraging people to mess around with an invasive species like Japanese Knotweed is, in my view, irresponsible. It may also be illegal.

In Ireland, it is an offence to "plant, disperse or cause to disperse or otherwise cause to grow" the plant. It is also an offence if "he/she has in his/her possession for sale or for breeding/reproduction/transport....anything from which the plant can be reproduced or propagated".

In the meantime, there are chemical and physical control options and scientists in the UK are developing a biological control approach using a sap-sucking insect called Aphalara itadori. This is an old enemy of the plant, found in Japan and currently being tested in the UK to see if it will do the same job in this part of the world (and not eat anything else, by accident). The trials haven't been a total success with numbers surviving over winter too low to have much of an effect, but the tests are ongoing. Hopefully, before too long we will have a sustainable control option for this invasive plant. In the meantime, stop eating it.

See original: Communicate Science Eating Japanese Knotweed (and other daft ideas)

Smoothing, regularisation, penalization and friends

In nonparametric statistics we might estimate simultaneously what look like
many, many parameters, which we constrain in some clever fashion,
which usually boils down to something we can interpret as a “smoothing”
parameters, controlling how many parameters we still have to model
from a subset of the original.

The “regularisation” nomenclature claims descent from Tikhonov, (eg TiGl65 etc) who wanted to solve ill-conditioned integral and differential equations, so it’s slightly more general.
“Smoothing” seems to be common in the
spline and
kernel estimate communities of
Wahba (Wahb90) and Silverman (Silv84) et al,
who usually actually want to smooth curves.

Penalization” has a geneology unknown to me, but is probably the least abstruse for common usage.

These are, AFAICT, more or less the same thing.
“smoothing” is more common in my communities which is fine,
but we have to remember that “smoothing” an estimator might not always infer smooth dynamics in the estimand;
it could be something else being smoothed, such as variance in the estimate of parameters of a rough function.

In every case, you wish to solve an ill-conditioned inverse problem, so you tame it by adding a penalty to solutions you feel one should be reluctant to accept.

TODO: make comprehensible

TODO: examples

TODO: discuss connection with model selection

TODO: discuss connection with compressed sensing.

The real classic approach here is spline smoothing of functional data.
More recent approaches are things like sparse regression.

Refs

Bach00
Bach, F. (n.d.) Model-Consistent Sparse Estimation through the Bootstrap.
ChHS15
Chernozhukov, V., Hansen, C., & Spindler, M. (2015) Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach. Annual Review of Economics, 7(1), 649–688. DOI.
EHJT04
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004) Least angle regression. The Annals of Statistics, 32(2), 407–499. DOI.
FlHS13
Flynn, C. J., Hurvich, C. M., & Simonoff, J. S.(2013) Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models. arXiv:1302.2068 [Stat].
FrHT10
Friedman, J., Hastie, T., & Tibshirani, R. (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. DOI.
JaFH13
Janson, L., Fithian, W., & Hastie, T. (2013) Effective Degrees of Freedom: A Flawed Metaphor. arXiv:1312.7851 [Stat].
KaRo14
Kaufman, S., & Rosset, S. (2014) When does more regularization imply fewer degrees of freedom? Sufficient conditions and counterexamples. Biometrika, 101(4), 771–784. DOI.
KoMi06
Koenker, R., & Mizera, I. (2006) Density estimation by total variation regularization. Advances in Statistical Modeling and Inference, 613–634.
LiRW10
Liu, H., Roeder, K., & Wasserman, L. (2010) Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), Advances in Neural Information Processing Systems 23 (pp. 1432–1440). Curran Associates, Inc.
MeBü10
Meinshausen, N., & Bühlmann, P. (2010) Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473. DOI.
Meye08
Meyer, M. C.(2008) Inference using shape-restricted regression splines. The Annals of Applied Statistics, 2(3), 1013–1033. DOI.
Silv84
Silverman, B. W.(1984) Spline Smoothing: The Equivalent Variable Kernel Method. The Annals of Statistics, 12(3), 898–916. DOI.
SmSM98
Smola, A. J., Schölkopf, B., & Müller, K.-R. (1998) The connection between regularization operators and support vector kernels. Neural Networks, 11(4), 637–649. DOI.
TKPS14
Tansey, W., Koyejo, O., Poldrack, R. A., & Scott, J. G.(2014) False discovery rate smoothing. arXiv:1411.6144 [Stat].
TiGl65
Tikhonov, A. N., & Glasko, V. B.(1965) Use of the regularization method in non-linear problems. USSR Computational Mathematics and Mathematical Physics, 5(3), 93–107. DOI.
Geer14
van de Geer, S. (2014) Weakly decomposable regularization penalties and structured sparsity. Scandinavian Journal of Statistics, 41(1), 72–86. DOI.
Wahb90
Wahba, G. (1990) Spline Models for Observational Data. . SIAM
WeMZ16
Weng, H., Maleki, A., & Zheng, L. (2016) Overcoming The Limitations of Phase Transition by Higher Order Analysis of Regularization Techniques. arXiv:1603.07377 [Cs, Math, Stat].
Wood00
Wood, S. N.(2000) Modelling and smoothing parameter estimation with multiple quadratic penalties. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(2), 413–428. DOI.
Wood08
Wood, S. N.(2008) Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3), 495–518. DOI.
ZoHa05
Zou, H., & Hastie, T. (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. DOI.
ZoHT07
Zou, H., Hastie, T., & Tibshirani, R. (2007) On the “degrees of freedom” of the lasso. The Annals of Statistics, 35(5), 2173–2192. DOI.

See original: The Living Thing / Notebooks Smoothing, regularisation, penalization and friends

DJing

DJ Rupture:

Yet our sounds are also a vocabulary for those who detest the walled-off concentrations of wealth, and steal property back: the collectives that build their own sound systems, stage free parties, and invite DJs to perform. The international DJ becomes emblematic of global capitalism’s complicated cultural dimension. On flights and at the free Continental breakfasts in hotels, often the same soul-destroying hotel chains in each city, we get stuck chatting with our fellow Americans and Western Europeans, the executives eager to find compatriots. We make small talk with these consultants and deal-makers in the descending elevators in the evening—then go out to the city’s dead-end and unowned spaces or its luxury venues to soundtrack the night of the region’s youth, hungry for something new. DJ music is now the common art form of squatters and the nouveau riche; it is the soundtrack both for capital and for its opposition.

http://www.ibrahimshaath.co.uk/keyfinder/
tangerine echonest

see also machine listening,
audio software

DJing software

So many choices, now. I use Ableton, but Traktor and Serrato are more designed for this.

Open source/ lower cost alternatives?

  • flow8deck is made by the people who made mixedinkey, software for the musically vexed. It handles keychanges good.
  • Traktor
  • Serrato
  • Djay

See original: The Living Thing / Notebooks DJing

Moving the poors to marginal electorate

OK, Let’s start treating politics like the favour machine it is and behave accordingly;
NSW under Mike baird is a system wherew you buy favours with leverage.
I’d like it to be otherwise, buyt let’s look

Optimal marginalness.
Invade marginal electorates
Oerganised opposition menas we are more likely to claim council seats as a side benefit.

See original: The Living Thing / Notebooks Moving the poors to marginal electorate

Recurrent neural networks

Feedback neural networks structured to have memory and a notion of “current” and “past” states, which can encode time (or whatever).

As someone who does a lot of signal processing for music, the notion that these generalise linear systems theory is suggestive of lots of interesting DSP applications.

The connection between these (IIR) and “convolutional” (FIR) neural networks is suggestive for the same reason.

Flavours

Vanilla

The main problem here is that they are unstable in the training phase unless you are clever.
See BeSF94. One solution is LSTM; see next.

Long Short Term Memory (LSTM)

As always, Christopher Olah wins the visual explanation prize:
Understanding LSTM Networks
LSTM Networks for Sentiment Analysis:

In a traditional recurrent neural network, during the gradient back-propagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of timesteps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.[…]

These issues are the main motivation behind the LSTM model which introduces a new structure called a memory cell…]. A memory cell is composed of four main elements: an input gate, a neuron with a self-recurrent connection (a connection to itself), a forget gate and an output gate. […]The gates serve to modulate the interactions between the memory cell itself and its environment.

GridRNN

A mini-genre.
KaDG15 et al connect recurrent cells across multiple axes, leading to a higher-rank MIMO system;
This is natural in many kinds of spatial random fields, and I am amazed it was uncommon enough to need formalizing in a paper; but it was and it did and good on Kalchbrenner et al.

Gate Recurrent Unit (GRU)

TBD

Liquid/ Echo State Machines

This sounds deliciously lazy;
Very roughly speaking, your first layer is a reservoir of random saturating IIR filters.
You fit a classifier on the outputs of this.
Easy to implement, that.
I wonder when it actually works, constraints on topology etc.

I wonder if you can use some kind of sparsifying transform on the recurrence operator?

These claim to be based on spiky models, but AFAICT this is not at all necessary.

Various claims are made about how hard they avoid the training difficulty of similarly basic RNNs by being essentially untrained; you use them as a feature factory for another supervised output algorithm.

Suggestive parallel with random projections.

LuJa09:

From a dynamical systems perspective, there are two main classes of RNNs.
Models from the first class are characterized by an energy-minimizing
stochastic dynamics and symmetric connections.
The best known instantiations are Hopfield networks, Boltzmann machines, and
the recently emerging Deep Belief Networks.
These networks are mostly trained in some unsupervised learning scheme.
Typical targeted network functionalities in this field are associative
memories, data compression, the unsupervised modeling of data distributions,
and static pattern classification, where the model is run for multiple time
steps per single input instance to reach some type of convergence or
equilibrium
(but see e.g., TaHR06 for extension to temporal data).
The mathematical background is rooted in statistical physics.
In contrast, the second big class of RNN models typically features a
deterministic update dynamics and directed connections.
Systems from this class implement nonlinear filters, which
transform an input time series into an output time series.
The mathematical background here is nonlinear dynamical systems.
The standard training mode is supervised.
This survey is concerned only with RNNs of this second type, and
when we speak of RNNs later on, we will exclusively refer to such systems.

Other

It’s still the wild west. Invent a category, name it and stake a claim.

Practicalities

Variable sequence length:
https://gist.github.com/evanthebouncy/8e16148687e807a46e3f

Danijar Hafner:
* Introduction to Recurrent Networks in TensorFlow
* https://danijar.com/variable-sequence-lengths-in-tensorflow/

seq2seq models with GRUs : Fun with Recurrent Neural Nets: One More Dive into CNTK and TensorFlow

Refs

AuBM08
Auer, P., Burgsteiner, H., & Maass, W. (2008) A learning rule for very simple universal approximators consisting of a single layer of perceptrons. Neural Networks, 21(5), 786–795. DOI.
BeSF94
Bengio, Y., Simard, P., & Frasconi, P. (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. DOI.
BoBV12
Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012) Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In 29th International Conference on Machine Learning.
BoLe06
Bown, O., & Lexer, S. (2006) Continuous-Time Recurrent Neural Networks for Generative and Interactive Musical Performance. In F. Rothlauf, J. Branke, S. Cagnoni, E. Costa, C. Cotta, R. Drechsler, … H. Takagi (Eds.), Applications of Evolutionary Computing (pp. 652–663). Springer Berlin Heidelberg
BuMe05
Buhusi, C. V., & Meck, W. H.(2005) What makes us tick? Functional and neural mechanisms of interval timing. Nature Reviews Neuroscience, 6(10), 755–765. DOI.
CMBB14
Cho, K., van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014) On the properties of neural machine translation: Encoder-decoder approaches. arXiv Preprint arXiv:1409.1259.
CGCB14
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014) Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In NIPS.
CGCB15
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2015) Gated Feedback Recurrent Neural Networks. arXiv:1502.02367 [Cs, Stat].
DoPo15
Doelling, K. B., & Poeppel, D. (2015) Cortical entrainment to music and its modulation by expertise. Proceedings of the National Academy of Sciences, 112(45), E6233–E6242. DOI.
DuPW14
Duan, Q., Park, J. H., & Wu, Z.-G. (2014) Exponential state estimator design for discrete-time neural networks with discrete and distributed time-varying delays. Complexity, 20(1), 38–48. DOI.
Gal15
Gal, Y. (2015) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. arXiv:1512.05287 [Stat].
GeSC00
Gers, F. A., Schmidhuber, J., & Cummins, F. (2000) Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10), 2451–2471. DOI.
GDGR15
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) DRAW: A Recurrent Neural Network For Image Generation. arXiv:1502.04623 [Cs].
GCWK09
Grzyb, B. J., Chinellato, E., Wojcik, G. M., & Kaminski, W. A.(2009) Which model to use for the Liquid State Machine?. In 2009 International Joint Conference on Neural Networks (pp. 1018–1024). DOI.
HaMa12
Hazan, H., & Manevitz, L. M.(2012) Topological constraints and robustness in liquid state machines. Expert Systems with Applications, 39(2), 1597–1606. DOI.
HDYD12
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., … Kingsbury, B. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. DOI.
HoSc97
Hochreiter, S., & Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. DOI.
JoZS15
Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015) An empirical exploration of recurrent network architectures. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp. 2342–2350).
KaDG15
Kalchbrenner, N., Danihelka, I., & Graves, A. (2015) Grid Long Short-Term Memory. arXiv:1507.01526 [Cs].
KaJF15
Karpathy, A., Johnson, J., & Fei-Fei, L. (2015) Visualizing and Understanding Recurrent Networks. arXiv:1506.02078 [Cs].
Lecu98
LeCun, Y. (1998) Gradient-based learning applied to document recognition. Proc. IEEE, 86(11), 2278–2324. DOI.
LeNM05
Legenstein, R., Naeger, C., & Maass, W. (2005) What Can a Neuron Learn with Spike-Timing-Dependent Plasticity?. Neural Computation, 17(11), 2337–2382. DOI.
LiBE15
Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015) A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv:1506.00019 [Cs].
LuJa09
Lukoševičius, M., & Jaeger, H. (2009) Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3), 127–149. DOI.
MaNM04
Maass, W., Natschläger, T., & Markram, H. (2004) Computational Models for Generic Cortical Microcircuits. In Computational Neuroscience: A Comprehensive Approach (pp. 575–605). Chapman & Hall/CRC
Mico15
Miconi, T. (2015) Training recurrent neural networks with sparse, delayed rewards for flexible decision tasks. arXiv:1507.08973 [Q-Bio].
Mnih15
Mnih, V. (2015) Human-level control through deep reinforcement learning. Nature, 518, 529–533. DOI.
MoDH12
Mohamed, A. r, Dahl, G. E., & Hinton, G. (2012) Acoustic Modeling Using Deep Belief Networks. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 14–22. DOI.
OIMT15
Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E. H., & Freeman, W. T.(2015) Visually Indicated Sounds. arXiv:1512.08512 [Cs].
RoRS15
Rohrbach, A., Rohrbach, M., & Schiele, B. (2015) The Long-Short Story of Movie Description. arXiv:1506.01698 [Cs].
Schw07
Schwenk, H. (2007) Continuous space language models. Computer Speech Lang., 21, 492–518. DOI.
TaHR06
Taylor, G. W., Hinton, G. E., & Roweis, S. T.(2006) Modeling human motion using binary latent variables. In Advances in neural information processing systems (pp. 1345–1352).
ThBe15
Theis, L., & Bethge, M. (2015) Generative Image Modeling Using Spatial LSTMs. arXiv:1506.03478 [Cs, Stat].
VKCM15
Visin, F., Kastner, K., Cho, K., Matteucci, M., Courville, A., & Bengio, Y. (2015) ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv:1505.00393 [Cs].
Waib89
Waibel, A. (1989) Phoneme recognition using time-delay neural networks. IEEE Trans. Acoustics Speech Signal Process., 37(3), 328–339. DOI.
YTCB15
Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., & Courville, A. (2015) Describing Videos by Exploiting Temporal Structure. arXiv:1502.08029 [Cs, Stat].

See original: The Living Thing / Notebooks Recurrent neural networks

Generalised linear models

Using the machinery of linear regression to predict in
somewhat more general regressions.

This means you are still doing Maximum Likelihood regression,
but outside the setting of homoskedastic gaussian noise and linear response.

Not quite as fancy as generalised additive models,
but if you have to implement such models yourself,
less work. If you are using R this is not you.

To learn:

  1. When we can do this? e.g. Must the response be from an exponential family for really real? Wikipedia mentions the “overdispersed exponential family” which is no such thing.
  2. Does anything funky happen with regularisation?
  3. Whether to merge this in with quasilikelihood.
  4. Fitting variance parameters.

Pieces of the method follow.

Response distribution

TBD. What constraints do we have here

Linear Predictor

Link function

An invertible (monotonic?) function
relating the mean of the linear predictor and
the mean of the response distribution.

Refs

BuHT89
Buja, A., Hastie, T., & Tibshirani, R. (1989) Linear Smoothers and Additive Models. The Annals of Statistics, 17(2), 453–510.
CuDE06
Currie, I. D., Durban, M., & Eilers, P. H. C.(2006) Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(2), 259–280. DOI.
FrHT10
Friedman, J., Hastie, T., & Tibshirani, R. (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. DOI.
Hans10
Hansen, N. R.(2010) Penalized maximum likelihood estimation for generalized linear point processes. arXiv:1003.0848 [Math, Stat].
Hoss09
Hosseinian, Sahar. (2009) Robust inference for generalized linear models: binary and poisson regression. . ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
LeNP06
Lee, Y., Nelder, J. A., & Pawitan, Y. (2006) Generalized linear models with random effects. . Boca Raton, FL: Chapman & Hall/CRC
Mccu84
McCullagh, P. (1984) Generalized linear models. European Journal of Operational Research, 16(3), 285–292. DOI.
NeBa04
Nelder, J. A., & Baker, R. J.(2004) Generalized Linear Models. In Encyclopedia of Statistical Sciences. John Wiley & Sons, Inc.
NeWe72
Nelder, J. A., & Wedderburn, R. W. M.(1972) Generalized Linear Models. Journal of the Royal Statistical Society. Series A (General), 135(3), 370–384. DOI.
PrLu13
Proietti, T., & Luati, A. (2013) Generalised Linear Spectral Models (CEIS Research Paper No. 290). . Tor Vergata University, CEIS
Wedd74
Wedderburn, R. W. M.(1974) Quasi-likelihood functions, generalized linear models, and the Gauss—Newton method. Biometrika, 61(3), 439–447. DOI.
Wedd76
Wedderburn, R. W. M.(1976) On the existence and uniqueness of the maximum likelihood estimates for certain generalized linear models. Biometrika, 63(1), 27–32. DOI.
Wood08
Wood, S. N.(2008) Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3), 495–518. DOI.
XiWJ14
Xia, T., Wang, X.-R., & Jiang, X.-J. (2014) Asymptotic properties of maximum quasi-likelihood estimator in quasi-likelihood nonlinear models with misspecified variance function. Statistics, 48(4), 778–786. DOI.

See original: The Living Thing / Notebooks Generalised linear models

Artificial neural networks

Modern computational neural network methods reascend the hype phase transition.
a.k.a deep learning or extreme learning or double plus fancy brainbots or please can our department have a bigger computation budget it’s not to play video games i swear?.

@bhautikj style transfer experiment "Drumpf"

Style transfer will be familiar to anyone who has ever taken hallucinogens or watched movies made by those who have, but you can’t usually put hallucinogens or film nights on the departmental budget so we have to make do with gigantic computing clusters.

But what are “artificial neural networks”?

Either

  • a collection of incremental improvements machine learning techniques loosely inspired by real brains, that suurprisingly elicit the kind of results from machine learning networks that everyone was hoping we’d get by at least 20 years ago, or,
  • the state-of-the-art in artificial kitten recognition.

Why bother?

There are many answers here.

A classic —-

The ultimate regression algorithm

Common answer:
It turns out that this particular learning model (class of learning models),
while often not apparently well suited to a given problem,
does very well on general on lots of things,
and very often can keep on doing better and better the more resources you throw at it.
Why burn three grad students on a perfect regression algorithm when you can use
one algorithm to solve a whole bunch of regression problems just as well?

This is more interesting for the business-dev people.

Cool maths

Regularisation, function approximations, interesting manifold inference.

Even the stuff I’d assumed was trivial like backpropagation has a few wrinkles in practice.
See
Michael Nielson’s chapter and
Chrisopher Olah’s visual summary

Insight into the mind

TBD. Maybe.

Trippy art projects

See next.

Generative art applications

Most neural networks are invertible, giving you generative models.
(e.g.
run the model forwards, it recognises melodies;
run it “backwards”, it composes melodies.

It’s not quite running it backwards, in this vein, the “deep dreaming” project does this.
See, say, the above image from
google’s tripped-out image recognition systems) or
Gatys, Ecker and Bethge’s deep art
Neural networks do Monet quite well.
I’ve a weakness for ideas that give me plausible deniability for making
generative art while doing my maths homework.

Hip keywords for NN models

Not necessarily mutually exclusive;
some design patterns you can use.

See Tomasz Malisiewicz’s summary of Deep Learning Trends @ ICLR 2016

Adversarial

Train two networks to beat each other.
I have some intuitiuons why this might work, but need to learn more.

Convolutional

Signal processing baked in to neural networks. Not so complicated if you have ever done signal processing, apart from the abstruse use of “depth” to mean 2 different things in the literature.

Generally uses FIR filters plus some smudgy “pooling”
(which is nonlinear downsampling),
although IIR is also making an appearance by running RNN on multiple axes.

Terence broad go you

Spike-based

Most simulated neural networks are based on a continuous activation potential and discrete time, unlike spiking biological ones, which are driven by discrete events in continuous time.
There are a great many other differences.
What difference does this in particular make?
I suspect it make a difference regarding time.

Recurrent neural networks

Feedback neural networks with memory and therefore a notion of time and state.
As someone who does a lot of signal processing for music, the notion that these generalise linear systems theory is suggestive of lots of interesting DSP applications.

The connection with these and convolutional neural networks is suggestive for the same reason.

Vanilla

The main problem here is that they are unstable in the training phase unless you are clever.
See BeSF94. One solution is LSTM; see next.

Gate Recurrent Unit (GRU)

TBD

Long Short Term Memory (LSTM)

LSTM Networks for Sentiment Analysis:

In a traditional recurrent neural network, during the gradient back-propagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of timesteps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.[…]

These issues are the main motivation behind the LSTM model which introduces a new structure called a memory cell…]. A memory cell is composed of four main elements: an input gate, a neuron with a self-recurrent connection (a connection to itself), a forget gate and an output gate. […]The gates serve to modulate the interactions between the memory cell itself and its environment.

Cortical learning algorithms

Is this a real thing, or pure hype? How does it distinguish itself from other deep learning techniques aside from name-checking biomimetic engineering?
NuPIC has made a big splash with their open source brain-esque learning, and have open-sourced it;
on that basis alone looks like it could be fun to explore.

Extreme learning machines

Dunno.

Optimisation methods

TBD

Related questions

  • Artificial neural network are usually layers of linear projections
    sandwiched between saturating nonlinear maps.
    Why not more general nonlinearities?.
  • Can you know in advance how long it will take to fit a classifier
    or regression model for data of a given sort?
    The process looks so mechanical…

Regularisation in neural networks

L_1, L_2, dropout…

Compression of neural networks

It seems we should be able to do better than a gigantic network with millions of parameters;
Once we have trained the graph, how can we simplify it, compress it, or prune it?

Quantizing to single bits.

Encoding for neural networks

Neural networks take an inconvenient encoding format,
so general data has to be massaged.
Convolutional models are an important implicit encoding;
what else can we squeeze [in there/out of there]?

Software stuff

Too many. Neural networks are intuitive enough that everyone builds their own library.

I use Tensorflow, plus a side order of Keras.

  • R/MATLAB/Python/everything: MXNET.

  • Lua: Torch

  • MATLAB/Python: Caffe claims to be a “de facto standard”

  • Python: Theano

  • Python/C++: tensorflow seems to be the same thing as Theano,
    but it’s backed by google so probably has better long-term prospects.
    The construction of graphs is more explicit than in Theano, which I find easier to understand, although this means that you use the near-python syntax of Theano.
    Also claims to compile to smartphones etc, although that looks buggy atm.

  • Javascript (!) inference and training: convnetjs
    * plus bonus interview
    * sister project for recurrent networks: recurrentjs

  • synapticjs is a very full-feature javasceript training, inference and visualisation of neural network, with really good documentation. Great learning resource, with plausible examples.

  • javascript inference only, neocortexjt in the browser. Civilised.

  • brainjs is unmaintained now but looked like a nice simple javascript neural netowrk library.

  • mindjs is a simple one where you can see the moving parts.

  • iphone: DeepBeliefSDK

Examples

Howtos

To read

Refs

Amar98
Amari, S. (1998) Natural Gradient Works Efficiently in Learning. Neural Computation, 10(2), 251–276. DOI.
Arau00
Araujo, L. (2000) Evolutionary parsing for a probabilistic context free grammar. In Proc. of the Int. Conf. on on Rough Sets and Current Trends in Computing (RSCTC-2000), Lecture Notes in Computer Science 2005 (p. 590).
ArRK10
Arel, I., Rose, D. C., & Karnowski, T. P.(2010) Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier]. IEEE Computational Intelligence Magazine, 5(4), 13–18. DOI.
AGMM15
Arora, S., Ge, R., Ma, T., & Moitra, A. (2015) Simple, Efficient, and Neural Algorithms for Sparse Coding. arXiv:1503.00778 [cs, Stat].
BLPB12
Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., Bergeron, A., … Bengio, Y. (2012) Theano: new features and speed improvements. arXiv:1211.5590 [cs].
Beng09
Bengio, Y. (2009) Learning deep architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1–127. DOI.
BeCV13
Bengio, Y., Courville, A., & Vincent, P. (2013) Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Machine Intell., 35, 1798–1828. DOI.
BeLe07
Bengio, Y., & LeCun, Y. (2007) Scaling learning algorithms towards AI. Large-Scale Kernel Machines, 34, 1–41.
BeSF94
Bengio, Y., Simard, P., & Frasconi, P. (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. DOI.
Bose91
Boser, B. (1991) An analog neural network processor with programmable topology. J. Solid State Circuits, 26, 2017–2025. DOI.
Bott14
Bottou, L. (2014) From machine learning to machine reasoning. Mach. Learn., 94, 133–149. DOI.
BoBV12
Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012) Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In 29th International Conference on Machine Learning.
Cadi14
Cadieu, C. F.(2014) Deep neural networks rival the representation of primate it cortex for core visual object recognition. PLoS Comp. Biol., 10, e1003963. DOI.
CHMB15
Choromanska, A., Henaff, Mi., Mathieu, M., Ben Arous, G., & LeCun, Y. (2015) The Loss Surfaces of Multilayer Networks. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (pp. 192–204).
Ciod12
Ciodaro, T. (2012) Online particle detection with neural networks based on topological calorimetry information. J. Phys. Conf. Series, 368, 012030. DOI.
Cire12
Ciresan, D. (2012) Multi-column deep neural network for traffic sign classification. Neural Networks, 32, 333–338. DOI.
Dahl12
Dahl, G. E.(2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process., 20, 33–42. DOI.
DoSB14
Dosovitskiy, A., Springenberg, J. T., & Brox, T. (2014) Learning to Generate Chairs with Convolutional Neural Networks. arXiv:1411.5928 [cs].
Fara13
Farabet, C. (2013) Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell., 35, 1915–1929. DOI.
Fell91
Felleman, D. J.(1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex, 1, 1–47. DOI.
Fuku82
Fukushima, K. (1982) Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition, 15, 455–469. DOI.
Garc04
Garcia, C. (2004) Convolutional face finder: a neural architecture for fast and robust face detection. IEEE Trans. Pattern Anal. Machine Intell., 26, 1408–1423. DOI.
GaEB15
Gatys, L. A., Ecker, A. S., & Bethge, M. (2015) A Neural Algorithm of Artistic Style. arXiv:1508.06576 [cs, Q-Bio].
GiSB14
Giryes, R., Sapiro, G., & Bronstein, A. M.(2014) On the Stability of Deep Networks. arXiv:1412.5896 [cs, Math, Stat].
Hads09
Hadsell, R. (2009) Learning long-range vision for autonomous off-road driving. J. Field Robot., 26, 120–144. DOI.
HaCL06
Hadsell, R., Chopra, S., & LeCun, Y. (2006) Dimensionality Reduction by Learning an Invariant Mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 1735–1742). DOI.
Helm13
Helmstaedter, M. (2013) Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature, 500, 168–174. DOI.
Hint10
Hinton, G. (2010) A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade (Vol. 9, p. 926). Springer Berlin Heidelberg
HDYD12
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., … Kingsbury, B. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. DOI.
Hint95
Hinton, G. E.(1995) The wake-sleep algorithm for unsupervised neural networks. Science, 268, 1558–1161. DOI.
Hint07
Hinton, G. E.(2007) To recognize shapes, first learn to generate images. In T. D. and J. F. K. Paul Cisek (Ed.), Progress in Brain Research (Vol. Volume 165, pp. 535–547). Elsevier
HiSa06
Hinton, G. E., & Salakhutdinov, R. R.(2006) Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507. DOI.
HiOT06
Hinton, G., Osindero, S., & Teh, Y. (2006) A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527–1554. DOI.
HoSc97
Hochreiter, S., & Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. DOI.
HuSi05
Huang, G.-B., & Siew, C.-K. (2005) Extreme learning machine with randomly assigned RBF kernels. International Journal of Information Technology, 11(1), 16–24.
HuWL11
Huang, G.-B., Wang, D. H., & Lan, Y. (2011) Extreme learning machines: a survey. International Journal of Machine Learning and Cybernetics, 2(2), 107–122. DOI.
HuZS04
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings (Vol. 2, pp. 985–990 vol.2). DOI.
HuZS06
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006) Extreme learning machine: Theory and applications. Neurocomputing, 70(1–3), 489–501. DOI.
Hube62
Hubel, D. H.(1962) Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J. Physiol., 160, 106–154. DOI.
HuPC15
Hu, T., Pehlevan, C., & Chklovskii, D. B.(2015) A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization. arXiv:1503.00690 [cs, Q-Bio, Stat].
KaRL10
Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2010) Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition. arXiv:1010.3467 [cs].
KWKT15
Kulkarni, T. D., Whitney, W., Kohli, P., & Tenenbaum, J. B.(2015) Deep Convolutional Inverse Graphics Network. arXiv:1503.03167 [cs].
Lawr97
Lawrence, S. (1997) Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Networks, 8, 98–113. DOI.
Lecu98
LeCun, Y. (1998) Gradient-based learning applied to document recognition. Proc. IEEE, 86, 2278–2324. DOI.
LeBH15
LeCun, Y., Bengio, Y., & Hinton, G. (2015) Deep learning. Nature, 521(7553), 436–444. DOI.
LCHR06
LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006) A tutorial on energy-based learning. Predicting Structured Data.
LBRN07
Lee, H., Battle, A., Raina, R., & Ng, A. Y.(2007) Efficient sparse coding algorithms. Advances in Neural Information Processing Systems, 19, 801.
LGRN00
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y.(n.d.) Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. . Presented at the Proceedings of the 26th International Confer- ence on Machine Learning, 2009
Leun14
Leung, M. K.(2014) Deep learning of the tissue-regulated splicing code. Bioinformatics, 30, i121–i129. DOI.
Ma15
Ma, J. (2015) Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model., 55, 263–274. DOI.
Mall12
Mallat, S. (2012) Group Invariant Scattering. Communications on Pure and Applied Mathematics, 65(10), 1331–1398. DOI.
Mall16
Mallat, S. (2016) Understanding Deep Convolutional Networks. arXiv:1601.04920 [cs, Stat].
MaMD14
Marcus, G., Marblestone, A., & Dean, T. (2014) Neuroscience The atoms of neural computation. Science (New York, N.Y.), 346(6209), 551–552. DOI.
MCCD13
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs].
MiLS13
Mikolov, T., Le, Q. V., & Sutskever, I. (2013) Exploiting Similarities among Languages for Machine Translation. arXiv:1309.4168 [cs].
Mnih15
Mnih, V. (2015) Human-level control through deep reinforcement learning. Nature, 518, 529–533. DOI.
Moha12
Mohamed, A.-R. (2012) Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process., 20(1), 14–22. DOI.
Mont14
Montufar, G. (2014) When does a mixture of products contain a product of mixtures?. J. Discrete Math., 29, 321–347. DOI.
Ning05
Ning, F. (2005) Toward automatic phenotyping of developing embryos from videos. IEEE Trans. Image Process., 14, 1360–1371. DOI.
OlFi96a
Olshausen, B. A., & Field, D. J.(1996a) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609. DOI.
OlFi96b
Olshausen, B. A., & Field, D. J.(1996b) Natural image statistics and efficient coding. Network (Bristol, England), 7(2), 333–339. DOI.
OlFi04
Olshausen, B. A., & Field, D. J.(2004) Sparse coding of sensory inputs. Current Opinion in Neurobiology, 14(4), 481–487. DOI.
PaVe14
Paul, A., & Venkatasubramanian, S. (2014) Why does Deep Learning work? - A perspective from Group Theory. arXiv:1412.6621 [cs, Stat].
PeCh15
Pehlevan, C., & Chklovskii, D. B.(2015) A Hebbian/Anti-Hebbian Network Derived from Online Non-Negative Matrix Factorization Can Cluster and Discover Sparse Features. arXiv:1503.00680 [cs, Q-Bio, Stat].
RaBC08
Ranzato, M. aurelio, Boureau, Y. -la., & Cun, Y. L.(2008) Sparse Feature Learning for Deep Belief Networks. In J. C. Platt, D. Koller, Y. Singer, & S. T. Roweis (Eds.), Advances in Neural Information Processing Systems 20 (pp. 1185–1192). Curran Associates, Inc.
Ranz13
Ranzato, M. (2013) Modeling natural images using gated MRFs. IEEE Trans. Pattern Anal. Machine Intell., 35, 2206–2222. DOI.
Rume86
Rumelhart, D. E.(1986) Learning representations by back-propagating errors. Nature, 323, 533–536. DOI.
SGAL14
Sagun, L., Guney, V. U., Arous, G. B., & LeCun, Y. (2014) Explorations on high dimensional landscapes. arXiv:1412.6615 [cs, Stat].
Schw07
Schwenk, H. (2007) Continuous space language models. Computer Speech Lang., 21, 492–518. DOI.
SiOl01
Simoncelli, E. P., & Olshausen, B. A.(2001) Natural Image Statistics and Neural Representation. Annual Review of Neuroscience, 24(1), 1193–1216. DOI.
SDBR14
Springenberg, J. T., Dosovitskiy, A., Brox, T., & Riedmiller, M. (2014) Striving for Simplicity: The All Convolutional Net. arXiv:1412.6806 [cs].
Tura10
Turaga, S. C.(2010) Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Comput., 22, 511–538. DOI.
Waib89
Waibel, A. (1989) Phoneme recognition using time-delay neural networks. IEEE Trans. Acoustics Speech Signal Process., 37, 328–339. DOI.
WiBö15
Wiatowski, T., & Bölcskei, H. (2015) A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction. arXiv:1512.06293 [cs, Math, Stat].
Xion15
Xiong, H. Y.(2015) The human splicing code reveals new insights into the genetic determinants of disease. Science, 347, 6218. DOI.
ZhCL14
Zhang, S., Choromanska, A., & LeCun, Y. (2014) Deep learning with Elastic Averaging SGD. arXiv:1412.6651 [cs, Stat].

See original: The Living Thing / Notebooks Artificial neural networks

Long memory models

Processes where we know that ancient history is still relevent for the future predictions, even if we know the recent history.

In my own mental map this is near-synonymous with stateful models
where we ignore the state, which I suppose is a kind of coarse graining.
If we aren’t concerned with a process but i.i.d. occurrences we might look at our
hidden variables differently.

This particular approach up being a popular simplification, because hidden states can be computationally difficult
to infer as well as having possibly high
sample complexity.
Maybe for other reasons too?

But in this formulation, we have a Markov process but
because we do not observe the whole state it looks non-Markov.
This is reasonably consistent with reality, where we believe
the current state of reality determines the future,
but we don’t know the whole current state.
Related: hidden variable quantum mechanics.

Note “long memory” is considered as a model for with time series, but clearly spatial random fields, or random fields indexed by any number of dimensions, with or without causality constraints, can have this property.

Main questions for me:

  1. Can we use the “memory length” of a system to infer the number of hidden states for some class of interesting systems, or vice versa?
  1. which classes?
  1. Can we infer the memory length alone as a parameter of interest in some classes? (need making precise). Information criteria don’t do this model order selection consistently.

Reading

Bera92
Beran, J. (1992) Statistical Methods for Data with Long-Range Dependence. Statistical Science, 7(4), 404–416.
Bera10
Beran, J. (2010) Long-range dependence. Wiley Interdisciplinary Reviews: Computational Statistics, 2(1), 26–35. DOI.
BHKS06
Berkes, I., Horváth, L., Kokoszka, P., & Shao, Q.-M. (2006) On discriminating between long-range dependence and changes in mean. The Annals of Statistics, 34(3), 1140–1165. DOI.
BrCL98
Breidt, F. J., Crato, N., & de Lima, P. (1998) The detection and estimation of long memory in stochastic volatility. Journal of Econometrics, 83(1–2), 325–348.
CsMi99
Csörgö, S., & Mielniczuk, J. (1999) Random-design regression under long-range dependent errors. Bernoulli, 5(2), 209–224. DOI.
DiIn01
Diebold, F. X., & Inoue, A. (2001) Long memory and regime switching. Journal of Econometrics, 105(1), 131–159. DOI.
DoOT03
Doukhan, P., Oppenheim, G., & Taqqu, M. S.(2003) Theory and applications of long-range dependence. . Birkhauser
FGLM07
Farmer, J. D., Gerig, A., Lillo, F., & Mike, S. (2007) Market efficiency and the long-memory of supply and demand: is price impact variable and permanent or fixed and temporary?. Quantitative Finance, 6(2), 107–112. DOI.
GiSu99
Giraitis, L., & Surgailis, D. (1999) Central limit theorem for the empirical process of a linear sequence with long memory. Journal of Statistical Planning and Inference, 80(1–2), 81–93.
Gnei00
Gneiting, T. (2000) Power-law correlations, related models for long-range dependence and their simulation. Journal of Applied Probability, 37(4), 1104–1109.
GrJo80
Granger, C. W. J., & Joyeux, R. (1980) An Introduction to Long-Memory Time Series Models and Fractional Differencing. Journal of Time Series Analysis, 1(1), 15–29. DOI.
Horv01
Horváth, L. (2001) Change-point detection in long-memory processes. Journal of Multivariate Analysis, 78(2), 218–234.
Küns86
Künsch, H. R.(1986) Discrimination between monotonic trends and long-range dependence. Journal of Applied Probability, 23(4), 1025–1030.
Lahi93
Lahiri, S. N.(1993) On the moving block bootstrap under long range dependence. Statistics & Probability Letters, 18(5), 405–413. DOI.
Robi03
Robinson, P. M.(2003) Time series with long memory. . Oxford Univ Pr
SaSo10
Saichev, A. I., & Sornette, D. (2010) Generation-by-generation dissection of the response function in long memory epidemic processes. The European Physical Journal B, 75(3), 343–355. DOI.

See original: The Living Thing / Notebooks Long memory models

Sample complexity

The machine-learning-ish approach to analysing estimator convergence.

TBD

See original: The Living Thing / Notebooks Sample complexity

Hidden variables and latent factors

This isn’t a thing per se.
But I want a landing page for a few differnt ways to think about hidden variables.

If the hidden varaibles are random, I might think of hierarchical models, latent factors, deep networks, informative sampling, censored data…

If the hidden variables are determninistic I might think of algorithmic complexity.

Random thing:
Matrix factorisation to approximate latent factors and their interaction in recommender systems.

KoBV09
Koren, Y., Bell, R., & Volinsky, C. (2009) Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 30–37. DOI.

See original: The Living Thing / Notebooks Hidden variables and latent factors

Une approche pluridisciplinaire et pluri-scalaire des dynamiques paysagères du bassin minier du Nord-Pas-de-Calais 

Ce texte revient sur la démarche de recherche participative et pluridisciplinaire menée durant trois ans dans le cadre du programme ITTECOP sur le territoire de la communauté d’agglomération d’Henin Carvin (CAHC) dans le Nord-Pas-de-Calais. Une caractéristique de cette recherche est d’avoir rassemblé une équipe de paysagistes et une de sociologues. La démarche part d’une conception du paysage comme cadre intégrateur de réflexion et d’action et outil capable de faire émerger une action territoriale concertée et durable. Le propos dépasse néanmoins une définition commune et désormais assez consensuelle du paysage comme « construction sociale », en ajoutant une interrogation sur la diversité des acteurs aménageurs, au-delà de l’image erronée de la toute-puissance des acteurs publics et aménageurs légitimes du territoire (ou à l’inverse de leur perte de pouvoir et de la privatisation des espaces urbains au profit des seuls acteurs économiques devenus tout puissants). La démarche scienti...

See original: VertigO - la revue électronique en sciences de l'environnement Une approche pluridisciplinaire et pluri-scalaire des dynamiques paysagères du bassin minier du Nord-Pas-de-Calais