Artificial neural networks
Modern computational neural network methods reascend the hype phase transition.
a.k.a deep learning or extreme learning or double plus fancy brainbots or please can our department have a bigger computation budget it’s not to play video games i swear?.
Style transfer will be familiar to anyone who has ever taken hallucinogens or watched movies made by those who have, but you can’t usually put hallucinogens or film nights on the departmental budget so we have to make do with gigantic computing clusters.
But what are “artificial neural networks”?
Either
 a collection of incremental improvements machine learning techniques loosely inspired by real brains, that suurprisingly elicit the kind of results from machine learning networks that everyone was hoping we’d get by at least 20 years ago, or,
 the stateoftheart in artificial kitten recognition.
Why bother?
There are many answers here.
A classic —
The ultimate regression algorithm
Common answer:
It turns out that this particular learning model (class of learning models),
while often not apparently well suited to a given problem,
does very well on general on lots of things,
and very often can keep on doing better and better the more resources you throw at it.
Why burn three grad students on a perfect regression algorithm when you can use
one algorithm to solve a whole bunch of regression problems just as well?
This is more interesting for the businessdev people.
Cool maths
Regularisation, function approximations, interesting manifold inference.
Even the stuff I’d assumed was trivial like backpropagation has a few wrinkles in practice.
See
Michael Nielson’s chapter and
Chrisopher Olah’s visual summary
Insight into the mind
TBD. Maybe.
Trippy art projects
See next.
Generative art applications
 generating music
 messing with copyright lawyers’ minds by copressing films to vectors (More technical version)
The nice hack here is called “generative adversarial networks”
Most neural networks are invertible, giving you generative models.
(e.g.
run the model forwards, it recognises melodies;
run it “backwards”, it composes melodies.
It’s not quite running it backwards, in this vein, the “deep dreaming” project does this.
See, say, the above image from
google’s trippedout image recognition systems) or
Gatys, Ecker and Bethge’s deep art
Neural networks do Monet quite well.
I’ve a weakness for ideas that give me plausible deniability for making
generative art while doing my maths homework.
Hip keywords for NN models
Not necessarily mutually exclusive;
some design patterns you can use.
See Tomasz Malisiewicz’s summary of Deep Learning Trends @ ICLR 2016
Adversarial
Train two networks to beat each other.
I have some intuitiuons why this might work, but need to learn more.
Convolutional
Signal processing baked in to neural networks. Not so complicated if you have ever done signal processing, apart from the abstruse use of “depth” to mean 2 different things in the literature.
Generally uses FIR filters plus some smudgy “pooling”
(which is nonlinear downsampling),
although IIR is also making an appearance by running RNN on multiple axes.
Spikebased
Most simulated neural networks are based on a continuous activation potential and discrete time, unlike spiking biological ones, which are driven by discrete events in continuous time.
There are a great many other differences.
What difference does this in particular make?
I suspect it make a difference regarding time.
Recurrent neural networks
Feedback neural networks with memory and therefore a notion of time and state.
As someone who does a lot of signal processing for music, the notion that these generalise linear systems theory is suggestive of lots of interesting DSP applications.
The connection with these and convolutional neural networks is suggestive for the same reason.
 Awesome RNN is a curated links list of implementations.
 Andrej Karpathy: The unreasonable effectiveness of RNN
 Christopher Olah: Understanding LTSM RNNs
 Jeff Donahue Long term recurrent NN
 Ross Gibson Adventures in narrated reality gives an overview of text generation using RNNs
Vanilla
The main problem here is that they are unstable in the training phase unless you are clever.
See BeSF94. One solution is LSTM; see next.
Gate Recurrent Unit (GRU)
TBD
Long Short Term Memory (LSTM)
LSTM Networks for Sentiment Analysis:
In a traditional recurrent neural network, during the gradient backpropagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of timesteps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.[…]
These issues are the main motivation behind the LSTM model which introduces a new structure called a memory cell…]. A memory cell is composed of four main elements: an input gate, a neuron with a selfrecurrent connection (a connection to itself), a forget gate and an output gate. […]The gates serve to modulate the interactions between the memory cell itself and its environment.
Cortical learning algorithms
Is this a real thing, or pure hype? How does it distinguish itself from other deep learning techniques aside from namechecking biomimetic engineering?
NuPIC has made a big splash with their open source brainesque learning, and have opensourced it;
on that basis alone looks like it could be fun to explore.
 NuPIC is an open source entrant in the field
 How it works
 More How it works
Extreme learning machines
Dunno.
Autoencoding
Optimisation methods
TBD
Related questions
 Artificial neural network are usually layers of linear projections
sandwiched between saturating nonlinear maps.
Why not more general nonlinearities?.  Can you know in advance how long it will take to fit a classifier
or regression model for data of a given sort?
The process looks so mechanical…
Regularisation in neural networks
L_1, L_2, dropout…
Compression of neural networks
It seems we should be able to do better than a gigantic network with millions of parameters;
Once we have trained the graph, how can we simplify it, compress it, or prune it?
Quantizing to single bits.
Encoding for neural networks
Neural networks take an inconvenient encoding format,
so general data has to be massaged.
Convolutional models are an important implicit encoding;
what else can we squeeze [in there/out of there]?
 Radial basis functions
 probabilities
Software stuff
Too many. Neural networks are intuitive enough that everyone builds their own library.
I use Tensorflow, plus a side order of Keras.

R/MATLAB/Python/everything: MXNET.

Lua: Torch

MATLAB/Python: Caffe claims to be a “de facto standard”

Python: Theano
 Tastes better with Lasagne
 which in turn likes nolearn
 …Or this minute’s flavour, keras. Keras is a (probably temporary) de facto standard for transporting trained neural networks to new architectures.
 Less trendy (?) — Pylearn2: Machine Learning library based on Theano and Python
 python/cuda: deepnet
 https://github.com/dmlc/cxxnet and https://github.com/tqchen/mshadow: numpy interface, multiple GPU targets.
 Tastes better with Lasagne

Python/C++: tensorflow seems to be the same thing as Theano,
but it’s backed by google so probably has better longterm prospects.
The construction of graphs is more explicit than in Theano, which I find easier to understand, although this means that you use the nearpython syntax of Theano.
Also claims to compile to smartphones etc, although that looks buggy atm. Keras supports tensorflow as a backend too, for comfort and convenience
 tensorflowslim eases some boring bits.
 tflearn wraps the tensorflow machine in scikitlearn

Javascript (!) inference and training: convnetjs
* plus bonus interview
* sister project for recurrent networks: recurrentjs 
synapticjs is a very fullfeature javasceript training, inference and visualisation of neural network, with really good documentation. Great learning resource, with plausible examples.

javascript inference only, neocortexjt in the browser. Civilised.

brainjs is unmaintained now but looked like a nice simple javascript neural netowrk library.

mindjs is a simple one where you can see the moving parts.

iphone: DeepBeliefSDK
Examples
data
precomputed/trained models
 Caffe format:
 The Caffe Zoo has lots of nice models, pretrained on their wiki
 Here’s a great CV one, Andrej Karpathy’s image captioner, Neuraltalk2
 for the NVC dataset: http://www.stat.ucla.edu/~junhua.mao/projects/child_learning.html  pretrained feature model at http://www.stat.ucla.edu/~junhua.mao/projects/child_learning_folder/NVC_v201509_image_feat_VGGnet.npy)
 Alexnet http://arxiv.org/abs/1412.2302
 For lasgne: https://github.com/Lasagne/Recipes/tree/master/modelzoo
 For Keras:
Howtos
 Beginners guide by google staffers
 What’s wrong with deep learning? is a high speed diagrammatic introductory presentation with clickbait title, by one of the founding fathers, Yann LeCunn
 Yarin Gal on uncertainty quantification
 not exactly a “deep” network, but a great generative hack in this vein:
Generating Sequences With Recurrent Neural Networks  Memkit’s Deep learning bibliography
 deeplearning.net’s reading list…
 and their tutorials are pretty clear
 Michael Nielson has a free online textbook with code examples in python
 Dürr’s tutorial
 Geoffrey Hinton’s video draws the connection between Markov Random Fields and neural networks, and also links to lots of other video tutorials in the sidebar
 The cat recogniser team lead, Quoc Le, has some nice lectures
To read

We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoderdecoder model that tries to reconstruct the surrounding sentences of an encoded passage. Sentences that share semantic and syntactic properties are thus mapped to similar vector representations. […] The end result is an offtheshelf encoder that can produce highly generic sentence representations that are robust and perform well in practice

Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

Jeff Dean’s Large Scale Deep Learning at Google
The vector embedding is cool:
\begin{equation*}
E(Rome)  E(Italy) + E(Germany) \approx E(Berlin)
\end{equation*}
Refs
 Amar98
 Amari, S. (1998) Natural Gradient Works Efficiently in Learning. Neural Computation, 10(2), 251–276. DOI.
 Arau00
 Araujo, L. (2000) Evolutionary parsing for a probabilistic context free grammar. In Proc. of the Int. Conf. on on Rough Sets and Current Trends in Computing (RSCTC2000), Lecture Notes in Computer Science 2005 (p. 590).
 ArRK10
 Arel, I., Rose, D. C., & Karnowski, T. P.(2010) Deep Machine Learning  A New Frontier in Artificial Intelligence Research [Research Frontier]. IEEE Computational Intelligence Magazine, 5(4), 13–18. DOI.
 AGMM15
 Arora, S., Ge, R., Ma, T., & Moitra, A. (2015) Simple, Efficient, and Neural Algorithms for Sparse Coding. arXiv:1503.00778 [cs, Stat].
 BLPB12
 Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., Bergeron, A., … Bengio, Y. (2012) Theano: new features and speed improvements. arXiv:1211.5590 [cs].
 Beng09
 Bengio, Y. (2009) Learning deep architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1–127. DOI.
 BeCV13
 Bengio, Y., Courville, A., & Vincent, P. (2013) Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Machine Intell., 35, 1798–1828. DOI.
 BeLe07
 Bengio, Y., & LeCun, Y. (2007) Scaling learning algorithms towards AI. LargeScale Kernel Machines, 34, 1–41.
 BeSF94
 Bengio, Y., Simard, P., & Frasconi, P. (1994) Learning longterm dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. DOI.
 Bose91
 Boser, B. (1991) An analog neural network processor with programmable topology. J. Solid State Circuits, 26, 2017–2025. DOI.
 Bott14
 Bottou, L. (2014) From machine learning to machine reasoning. Mach. Learn., 94, 133–149. DOI.
 BoBV12
 BoulangerLewandowski, N., Bengio, Y., & Vincent, P. (2012) Modeling Temporal Dependencies in HighDimensional Sequences: Application to Polyphonic Music Generation and Transcription. In 29th International Conference on Machine Learning.
 Cadi14
 Cadieu, C. F.(2014) Deep neural networks rival the representation of primate it cortex for core visual object recognition. PLoS Comp. Biol., 10, e1003963. DOI.
 CHMB15
 Choromanska, A., Henaff, Mi., Mathieu, M., Ben Arous, G., & LeCun, Y. (2015) The Loss Surfaces of Multilayer Networks. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (pp. 192–204).
 Ciod12
 Ciodaro, T. (2012) Online particle detection with neural networks based on topological calorimetry information. J. Phys. Conf. Series, 368, 012030. DOI.
 Cire12
 Ciresan, D. (2012) Multicolumn deep neural network for traffic sign classification. Neural Networks, 32, 333–338. DOI.
 Dahl12
 Dahl, G. E.(2012) Contextdependent pretrained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process., 20, 33–42. DOI.
 DoSB14
 Dosovitskiy, A., Springenberg, J. T., & Brox, T. (2014) Learning to Generate Chairs with Convolutional Neural Networks. arXiv:1411.5928 [cs].
 Fara13
 Farabet, C. (2013) Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell., 35, 1915–1929. DOI.
 Fell91
 Felleman, D. J.(1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex, 1, 1–47. DOI.
 Fuku82
 Fukushima, K. (1982) Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition, 15, 455–469. DOI.
 Garc04
 Garcia, C. (2004) Convolutional face finder: a neural architecture for fast and robust face detection. IEEE Trans. Pattern Anal. Machine Intell., 26, 1408–1423. DOI.
 GaEB15
 Gatys, L. A., Ecker, A. S., & Bethge, M. (2015) A Neural Algorithm of Artistic Style. arXiv:1508.06576 [cs, QBio].
 GiSB14
 Giryes, R., Sapiro, G., & Bronstein, A. M.(2014) On the Stability of Deep Networks. arXiv:1412.5896 [cs, Math, Stat].
 Hads09
 Hadsell, R. (2009) Learning longrange vision for autonomous offroad driving. J. Field Robot., 26, 120–144. DOI.
 HaCL06
 Hadsell, R., Chopra, S., & LeCun, Y. (2006) Dimensionality Reduction by Learning an Invariant Mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 1735–1742). DOI.
 Helm13
 Helmstaedter, M. (2013) Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature, 500, 168–174. DOI.
 Hint10
 Hinton, G. (2010) A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade (Vol. 9, p. 926). Springer Berlin Heidelberg
 HDYD12
 Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., … Kingsbury, B. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. DOI.
 Hint95
 Hinton, G. E.(1995) The wakesleep algorithm for unsupervised neural networks. Science, 268, 1558–1161. DOI.
 Hint07
 Hinton, G. E.(2007) To recognize shapes, first learn to generate images. In T. D. and J. F. K. Paul Cisek (Ed.), Progress in Brain Research (Vol. Volume 165, pp. 535–547). Elsevier
 HiSa06
 Hinton, G. E., & Salakhutdinov, R. R.(2006) Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507. DOI.
 HiOT06
 Hinton, G., Osindero, S., & Teh, Y. (2006) A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527–1554. DOI.
 HoSc97
 Hochreiter, S., & Schmidhuber, J. (1997) Long ShortTerm Memory. Neural Computation, 9(8), 1735–1780. DOI.
 HuSi05
 Huang, G.B., & Siew, C.K. (2005) Extreme learning machine with randomly assigned RBF kernels. International Journal of Information Technology, 11(1), 16–24.
 HuWL11
 Huang, G.B., Wang, D. H., & Lan, Y. (2011) Extreme learning machines: a survey. International Journal of Machine Learning and Cybernetics, 2(2), 107–122. DOI.
 HuZS04
 Huang, G.B., Zhu, Q.Y., & Siew, C.K. (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings (Vol. 2, pp. 985–990 vol.2). DOI.
 HuZS06
 Huang, G.B., Zhu, Q.Y., & Siew, C.K. (2006) Extreme learning machine: Theory and applications. Neurocomputing, 70(1–3), 489–501. DOI.
 Hube62
 Hubel, D. H.(1962) Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J. Physiol., 160, 106–154. DOI.
 HuPC15
 Hu, T., Pehlevan, C., & Chklovskii, D. B.(2015) A Hebbian/AntiHebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization. arXiv:1503.00690 [cs, QBio, Stat].
 KaRL10
 Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2010) Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition. arXiv:1010.3467 [cs].
 KWKT15
 Kulkarni, T. D., Whitney, W., Kohli, P., & Tenenbaum, J. B.(2015) Deep Convolutional Inverse Graphics Network. arXiv:1503.03167 [cs].
 Lawr97
 Lawrence, S. (1997) Face recognition: a convolutional neuralnetwork approach. IEEE Trans. Neural Networks, 8, 98–113. DOI.
 Lecu98
 LeCun, Y. (1998) Gradientbased learning applied to document recognition. Proc. IEEE, 86, 2278–2324. DOI.
 LeBH15
 LeCun, Y., Bengio, Y., & Hinton, G. (2015) Deep learning. Nature, 521(7553), 436–444. DOI.
 LCHR06
 LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006) A tutorial on energybased learning. Predicting Structured Data.
 LBRN07
 Lee, H., Battle, A., Raina, R., & Ng, A. Y.(2007) Efficient sparse coding algorithms. Advances in Neural Information Processing Systems, 19, 801.
 LGRN00
 Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y.(n.d.) Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. . Presented at the Proceedings of the 26th International Confer ence on Machine Learning, 2009
 Leun14
 Leung, M. K.(2014) Deep learning of the tissueregulated splicing code. Bioinformatics, 30, i121–i129. DOI.
 Ma15
 Ma, J. (2015) Deep neural nets as a method for quantitative structureactivity relationships. J. Chem. Inf. Model., 55, 263–274. DOI.
 Mall12
 Mallat, S. (2012) Group Invariant Scattering. Communications on Pure and Applied Mathematics, 65(10), 1331–1398. DOI.
 Mall16
 Mallat, S. (2016) Understanding Deep Convolutional Networks. arXiv:1601.04920 [cs, Stat].
 MaMD14
 Marcus, G., Marblestone, A., & Dean, T. (2014) Neuroscience The atoms of neural computation. Science (New York, N.Y.), 346(6209), 551–552. DOI.
 MCCD13
 Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs].
 MiLS13
 Mikolov, T., Le, Q. V., & Sutskever, I. (2013) Exploiting Similarities among Languages for Machine Translation. arXiv:1309.4168 [cs].
 Mnih15
 Mnih, V. (2015) Humanlevel control through deep reinforcement learning. Nature, 518, 529–533. DOI.
 Moha12
 Mohamed, A.R. (2012) Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process., 20(1), 14–22. DOI.
 Mont14
 Montufar, G. (2014) When does a mixture of products contain a product of mixtures?. J. Discrete Math., 29, 321–347. DOI.
 Ning05
 Ning, F. (2005) Toward automatic phenotyping of developing embryos from videos. IEEE Trans. Image Process., 14, 1360–1371. DOI.
 OlFi96a
 Olshausen, B. A., & Field, D. J.(1996a) Emergence of simplecell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609. DOI.
 OlFi96b
 Olshausen, B. A., & Field, D. J.(1996b) Natural image statistics and efficient coding. Network (Bristol, England), 7(2), 333–339. DOI.
 OlFi04
 Olshausen, B. A., & Field, D. J.(2004) Sparse coding of sensory inputs. Current Opinion in Neurobiology, 14(4), 481–487. DOI.
 PaVe14
 Paul, A., & Venkatasubramanian, S. (2014) Why does Deep Learning work?  A perspective from Group Theory. arXiv:1412.6621 [cs, Stat].
 PeCh15
 Pehlevan, C., & Chklovskii, D. B.(2015) A Hebbian/AntiHebbian Network Derived from Online NonNegative Matrix Factorization Can Cluster and Discover Sparse Features. arXiv:1503.00680 [cs, QBio, Stat].
 RaBC08
 Ranzato, M. aurelio, Boureau, Y. la., & Cun, Y. L.(2008) Sparse Feature Learning for Deep Belief Networks. In J. C. Platt, D. Koller, Y. Singer, & S. T. Roweis (Eds.), Advances in Neural Information Processing Systems 20 (pp. 1185–1192). Curran Associates, Inc.
 Ranz13
 Ranzato, M. (2013) Modeling natural images using gated MRFs. IEEE Trans. Pattern Anal. Machine Intell., 35, 2206–2222. DOI.
 Rume86
 Rumelhart, D. E.(1986) Learning representations by backpropagating errors. Nature, 323, 533–536. DOI.
 SGAL14
 Sagun, L., Guney, V. U., Arous, G. B., & LeCun, Y. (2014) Explorations on high dimensional landscapes. arXiv:1412.6615 [cs, Stat].
 Schw07
 Schwenk, H. (2007) Continuous space language models. Computer Speech Lang., 21, 492–518. DOI.
 SiOl01
 Simoncelli, E. P., & Olshausen, B. A.(2001) Natural Image Statistics and Neural Representation. Annual Review of Neuroscience, 24(1), 1193–1216. DOI.
 SDBR14
 Springenberg, J. T., Dosovitskiy, A., Brox, T., & Riedmiller, M. (2014) Striving for Simplicity: The All Convolutional Net. arXiv:1412.6806 [cs].
 Tura10
 Turaga, S. C.(2010) Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Comput., 22, 511–538. DOI.
 Waib89
 Waibel, A. (1989) Phoneme recognition using timedelay neural networks. IEEE Trans. Acoustics Speech Signal Process., 37, 328–339. DOI.
 WiBö15
 Wiatowski, T., & Bölcskei, H. (2015) A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction. arXiv:1512.06293 [cs, Math, Stat].
 Xion15
 Xiong, H. Y.(2015) The human splicing code reveals new insights into the genetic determinants of disease. Science, 347, 6218. DOI.
 ZhCL14
 Zhang, S., Choromanska, A., & LeCun, Y. (2014) Deep learning with Elastic Averaging SGD. arXiv:1412.6651 [cs, Stat].
See original: Artificial neural networks