Artificial neural networks

Printer-friendly version

Modern computational neural network methods reascend the hype phase transition.
a.k.a deep learning or extreme learning or double plus fancy brainbots or please can our department have a bigger computation budget it’s not to play video games i swear?.

@bhautikj style transfer experiment "Drumpf"

Style transfer will be familiar to anyone who has ever taken hallucinogens or watched movies made by those who have, but you can’t usually put hallucinogens or film nights on the departmental budget so we have to make do with gigantic computing clusters.

But what are “artificial neural networks”?


  • a collection of incremental improvements machine learning techniques loosely inspired by real brains, that suurprisingly elicit the kind of results from machine learning networks that everyone was hoping we’d get by at least 20 years ago, or,
  • the state-of-the-art in artificial kitten recognition.

Why bother?

There are many answers here.

A classic —-

The ultimate regression algorithm

Common answer:
It turns out that this particular learning model (class of learning models),
while often not apparently well suited to a given problem,
does very well on general on lots of things,
and very often can keep on doing better and better the more resources you throw at it.
Why burn three grad students on a perfect regression algorithm when you can use
one algorithm to solve a whole bunch of regression problems just as well?

This is more interesting for the business-dev people.

Cool maths

Regularisation, function approximations, interesting manifold inference.

Even the stuff I’d assumed was trivial like backpropagation has a few wrinkles in practice.
Michael Nielson’s chapter and
Chrisopher Olah’s visual summary

Insight into the mind

TBD. Maybe.

Trippy art projects

See next.

Generative art applications

Most neural networks are invertible, giving you generative models.
run the model forwards, it recognises melodies;
run it “backwards”, it composes melodies.

It’s not quite running it backwards, in this vein, the “deep dreaming” project does this.
See, say, the above image from
google’s tripped-out image recognition systems) or
Gatys, Ecker and Bethge’s deep art
Neural networks do Monet quite well.
I’ve a weakness for ideas that give me plausible deniability for making
generative art while doing my maths homework.

Hip keywords for NN models

Not necessarily mutually exclusive;
some design patterns you can use.

See Tomasz Malisiewicz’s summary of Deep Learning Trends @ ICLR 2016


Train two networks to beat each other.
I have some intuitiuons why this might work, but need to learn more.


Signal processing baked in to neural networks. Not so complicated if you have ever done signal processing, apart from the abstruse use of “depth” to mean 2 different things in the literature.

Generally uses FIR filters plus some smudgy “pooling”
(which is nonlinear downsampling),
although IIR is also making an appearance by running RNN on multiple axes.

Terence broad go you


Most simulated neural networks are based on a continuous activation potential and discrete time, unlike spiking biological ones, which are driven by discrete events in continuous time.
There are a great many other differences.
What difference does this in particular make?
I suspect it make a difference regarding time.

Recurrent neural networks

Feedback neural networks with memory and therefore a notion of time and state.
As someone who does a lot of signal processing for music, the notion that these generalise linear systems theory is suggestive of lots of interesting DSP applications.

The connection with these and convolutional neural networks is suggestive for the same reason.


The main problem here is that they are unstable in the training phase unless you are clever.
See BeSF94. One solution is LSTM; see next.

Gate Recurrent Unit (GRU)


Long Short Term Memory (LSTM)

LSTM Networks for Sentiment Analysis:

In a traditional recurrent neural network, during the gradient back-propagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of timesteps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.[…]

These issues are the main motivation behind the LSTM model which introduces a new structure called a memory cell…]. A memory cell is composed of four main elements: an input gate, a neuron with a self-recurrent connection (a connection to itself), a forget gate and an output gate. […]The gates serve to modulate the interactions between the memory cell itself and its environment.

Cortical learning algorithms

Is this a real thing, or pure hype? How does it distinguish itself from other deep learning techniques aside from name-checking biomimetic engineering?
NuPIC has made a big splash with their open source brain-esque learning, and have open-sourced it;
on that basis alone looks like it could be fun to explore.

Extreme learning machines


Optimisation methods


Related questions

  • Artificial neural network are usually layers of linear projections
    sandwiched between saturating nonlinear maps.
    Why not more general nonlinearities?.
  • Can you know in advance how long it will take to fit a classifier
    or regression model for data of a given sort?
    The process looks so mechanical…

Regularisation in neural networks

L_1, L_2, dropout…

Compression of neural networks

It seems we should be able to do better than a gigantic network with millions of parameters;
Once we have trained the graph, how can we simplify it, compress it, or prune it?

Quantizing to single bits.

Encoding for neural networks

Neural networks take an inconvenient encoding format,
so general data has to be massaged.
Convolutional models are an important implicit encoding;
what else can we squeeze [in there/out of there]?

Software stuff

Too many. Neural networks are intuitive enough that everyone builds their own library.

I use Tensorflow, plus a side order of Keras.

  • R/MATLAB/Python/everything: MXNET.

  • Lua: Torch

  • MATLAB/Python: Caffe claims to be a “de facto standard”

  • Python: Theano

  • Python/C++: tensorflow seems to be the same thing as Theano,
    but it’s backed by google so probably has better long-term prospects.
    The construction of graphs is more explicit than in Theano, which I find easier to understand, although this means that you use the near-python syntax of Theano.
    Also claims to compile to smartphones etc, although that looks buggy atm.

  • Javascript (!) inference and training: convnetjs
    * plus bonus interview
    * sister project for recurrent networks: recurrentjs

  • synapticjs is a very full-feature javasceript training, inference and visualisation of neural network, with really good documentation. Great learning resource, with plausible examples.

  • javascript inference only, neocortexjt in the browser. Civilised.

  • brainjs is unmaintained now but looked like a nice simple javascript neural netowrk library.

  • mindjs is a simple one where you can see the moving parts.

  • iphone: DeepBeliefSDK



To read


Amari, S. (1998) Natural Gradient Works Efficiently in Learning. Neural Computation, 10(2), 251–276. DOI.
Araujo, L. (2000) Evolutionary parsing for a probabilistic context free grammar. In Proc. of the Int. Conf. on on Rough Sets and Current Trends in Computing (RSCTC-2000), Lecture Notes in Computer Science 2005 (p. 590).
Arel, I., Rose, D. C., & Karnowski, T. P.(2010) Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier]. IEEE Computational Intelligence Magazine, 5(4), 13–18. DOI.
Arora, S., Ge, R., Ma, T., & Moitra, A. (2015) Simple, Efficient, and Neural Algorithms for Sparse Coding. arXiv:1503.00778 [cs, Stat].
Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., Bergeron, A., … Bengio, Y. (2012) Theano: new features and speed improvements. arXiv:1211.5590 [cs].
Bengio, Y. (2009) Learning deep architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1–127. DOI.
Bengio, Y., Courville, A., & Vincent, P. (2013) Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Machine Intell., 35, 1798–1828. DOI.
Bengio, Y., & LeCun, Y. (2007) Scaling learning algorithms towards AI. Large-Scale Kernel Machines, 34, 1–41.
Bengio, Y., Simard, P., & Frasconi, P. (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. DOI.
Boser, B. (1991) An analog neural network processor with programmable topology. J. Solid State Circuits, 26, 2017–2025. DOI.
Bottou, L. (2014) From machine learning to machine reasoning. Mach. Learn., 94, 133–149. DOI.
Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012) Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In 29th International Conference on Machine Learning.
Cadieu, C. F.(2014) Deep neural networks rival the representation of primate it cortex for core visual object recognition. PLoS Comp. Biol., 10, e1003963. DOI.
Choromanska, A., Henaff, Mi., Mathieu, M., Ben Arous, G., & LeCun, Y. (2015) The Loss Surfaces of Multilayer Networks. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (pp. 192–204).
Ciodaro, T. (2012) Online particle detection with neural networks based on topological calorimetry information. J. Phys. Conf. Series, 368, 012030. DOI.
Ciresan, D. (2012) Multi-column deep neural network for traffic sign classification. Neural Networks, 32, 333–338. DOI.
Dahl, G. E.(2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process., 20, 33–42. DOI.
Dosovitskiy, A., Springenberg, J. T., & Brox, T. (2014) Learning to Generate Chairs with Convolutional Neural Networks. arXiv:1411.5928 [cs].
Farabet, C. (2013) Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell., 35, 1915–1929. DOI.
Felleman, D. J.(1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex, 1, 1–47. DOI.
Fukushima, K. (1982) Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition, 15, 455–469. DOI.
Garcia, C. (2004) Convolutional face finder: a neural architecture for fast and robust face detection. IEEE Trans. Pattern Anal. Machine Intell., 26, 1408–1423. DOI.
Gatys, L. A., Ecker, A. S., & Bethge, M. (2015) A Neural Algorithm of Artistic Style. arXiv:1508.06576 [cs, Q-Bio].
Giryes, R., Sapiro, G., & Bronstein, A. M.(2014) On the Stability of Deep Networks. arXiv:1412.5896 [cs, Math, Stat].
Hadsell, R. (2009) Learning long-range vision for autonomous off-road driving. J. Field Robot., 26, 120–144. DOI.
Hadsell, R., Chopra, S., & LeCun, Y. (2006) Dimensionality Reduction by Learning an Invariant Mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 1735–1742). DOI.
Helmstaedter, M. (2013) Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature, 500, 168–174. DOI.
Hinton, G. (2010) A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade (Vol. 9, p. 926). Springer Berlin Heidelberg
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., … Kingsbury, B. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. DOI.
Hinton, G. E.(1995) The wake-sleep algorithm for unsupervised neural networks. Science, 268, 1558–1161. DOI.
Hinton, G. E.(2007) To recognize shapes, first learn to generate images. In T. D. and J. F. K. Paul Cisek (Ed.), Progress in Brain Research (Vol. Volume 165, pp. 535–547). Elsevier
Hinton, G. E., & Salakhutdinov, R. R.(2006) Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507. DOI.
Hinton, G., Osindero, S., & Teh, Y. (2006) A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527–1554. DOI.
Hochreiter, S., & Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. DOI.
Huang, G.-B., & Siew, C.-K. (2005) Extreme learning machine with randomly assigned RBF kernels. International Journal of Information Technology, 11(1), 16–24.
Huang, G.-B., Wang, D. H., & Lan, Y. (2011) Extreme learning machines: a survey. International Journal of Machine Learning and Cybernetics, 2(2), 107–122. DOI.
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings (Vol. 2, pp. 985–990 vol.2). DOI.
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006) Extreme learning machine: Theory and applications. Neurocomputing, 70(1–3), 489–501. DOI.
Hubel, D. H.(1962) Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J. Physiol., 160, 106–154. DOI.
Hu, T., Pehlevan, C., & Chklovskii, D. B.(2015) A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization. arXiv:1503.00690 [cs, Q-Bio, Stat].
Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2010) Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition. arXiv:1010.3467 [cs].
Kulkarni, T. D., Whitney, W., Kohli, P., & Tenenbaum, J. B.(2015) Deep Convolutional Inverse Graphics Network. arXiv:1503.03167 [cs].
Lawrence, S. (1997) Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Networks, 8, 98–113. DOI.
LeCun, Y. (1998) Gradient-based learning applied to document recognition. Proc. IEEE, 86, 2278–2324. DOI.
LeCun, Y., Bengio, Y., & Hinton, G. (2015) Deep learning. Nature, 521(7553), 436–444. DOI.
LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006) A tutorial on energy-based learning. Predicting Structured Data.
Lee, H., Battle, A., Raina, R., & Ng, A. Y.(2007) Efficient sparse coding algorithms. Advances in Neural Information Processing Systems, 19, 801.
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y.(n.d.) Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. . Presented at the Proceedings of the 26th International Confer- ence on Machine Learning, 2009
Leung, M. K.(2014) Deep learning of the tissue-regulated splicing code. Bioinformatics, 30, i121–i129. DOI.
Ma, J. (2015) Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model., 55, 263–274. DOI.
Mallat, S. (2012) Group Invariant Scattering. Communications on Pure and Applied Mathematics, 65(10), 1331–1398. DOI.
Mallat, S. (2016) Understanding Deep Convolutional Networks. arXiv:1601.04920 [cs, Stat].
Marcus, G., Marblestone, A., & Dean, T. (2014) Neuroscience The atoms of neural computation. Science (New York, N.Y.), 346(6209), 551–552. DOI.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs].
Mikolov, T., Le, Q. V., & Sutskever, I. (2013) Exploiting Similarities among Languages for Machine Translation. arXiv:1309.4168 [cs].
Mnih, V. (2015) Human-level control through deep reinforcement learning. Nature, 518, 529–533. DOI.
Mohamed, A.-R. (2012) Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process., 20(1), 14–22. DOI.
Montufar, G. (2014) When does a mixture of products contain a product of mixtures?. J. Discrete Math., 29, 321–347. DOI.
Ning, F. (2005) Toward automatic phenotyping of developing embryos from videos. IEEE Trans. Image Process., 14, 1360–1371. DOI.
Olshausen, B. A., & Field, D. J.(1996a) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609. DOI.
Olshausen, B. A., & Field, D. J.(1996b) Natural image statistics and efficient coding. Network (Bristol, England), 7(2), 333–339. DOI.
Olshausen, B. A., & Field, D. J.(2004) Sparse coding of sensory inputs. Current Opinion in Neurobiology, 14(4), 481–487. DOI.
Paul, A., & Venkatasubramanian, S. (2014) Why does Deep Learning work? - A perspective from Group Theory. arXiv:1412.6621 [cs, Stat].
Pehlevan, C., & Chklovskii, D. B.(2015) A Hebbian/Anti-Hebbian Network Derived from Online Non-Negative Matrix Factorization Can Cluster and Discover Sparse Features. arXiv:1503.00680 [cs, Q-Bio, Stat].
Ranzato, M. aurelio, Boureau, Y. -la., & Cun, Y. L.(2008) Sparse Feature Learning for Deep Belief Networks. In J. C. Platt, D. Koller, Y. Singer, & S. T. Roweis (Eds.), Advances in Neural Information Processing Systems 20 (pp. 1185–1192). Curran Associates, Inc.
Ranzato, M. (2013) Modeling natural images using gated MRFs. IEEE Trans. Pattern Anal. Machine Intell., 35, 2206–2222. DOI.
Rumelhart, D. E.(1986) Learning representations by back-propagating errors. Nature, 323, 533–536. DOI.
Sagun, L., Guney, V. U., Arous, G. B., & LeCun, Y. (2014) Explorations on high dimensional landscapes. arXiv:1412.6615 [cs, Stat].
Schwenk, H. (2007) Continuous space language models. Computer Speech Lang., 21, 492–518. DOI.
Simoncelli, E. P., & Olshausen, B. A.(2001) Natural Image Statistics and Neural Representation. Annual Review of Neuroscience, 24(1), 1193–1216. DOI.
Springenberg, J. T., Dosovitskiy, A., Brox, T., & Riedmiller, M. (2014) Striving for Simplicity: The All Convolutional Net. arXiv:1412.6806 [cs].
Turaga, S. C.(2010) Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Comput., 22, 511–538. DOI.
Waibel, A. (1989) Phoneme recognition using time-delay neural networks. IEEE Trans. Acoustics Speech Signal Process., 37, 328–339. DOI.
Wiatowski, T., & Bölcskei, H. (2015) A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction. arXiv:1512.06293 [cs, Math, Stat].
Xiong, H. Y.(2015) The human splicing code reveals new insights into the genetic determinants of disease. Science, 347, 6218. DOI.
Zhang, S., Choromanska, A., & LeCun, Y. (2014) Deep learning with Elastic Averaging SGD. arXiv:1412.6651 [cs, Stat].

See original: The Living Thing / Notebooks Artificial neural networks