Artificial neural networks

Printer-friendly version

Modern computational neural network methods reascend the hype phase transition.
a.k.a deep learning or extreme learning or double plus fancy brainbots or please can our department have a bigger computation budget it’s not to play video games i swear?.

@bhautikj style transfer experiment "Drumpf"

Style transfer will be familiar to anyone who has ever taken hallucinogens or watched movies made by those who have, but you can’t usually put hallucinogens or film nights on the departmental budget so we have to make do with gigantic computing clusters.

But what are “artificial neural networks”?

Either

  • a collection of incremental improvements machine learning techniques loosely inspired by real brains, that suurprisingly elicit the kind of results from machine learning networks that everyone was hoping we’d get by at least 20 years ago, or,
  • the state-of-the-art in artificial kitten recognition.

Why bother?

There are many answers here.

A classic —-

The ultimate regression algorithm

Common answer:
It turns out that this particular learning model (class of learning models),
while often not apparently well suited to a given problem,
does very well on general on lots of things,
and very often can keep on doing better and better the more resources you throw at it.
Why burn three grad students on a perfect regression algorithm when you can use
one algorithm to solve a whole bunch of regression problems just as well?

This is more interesting for the business-dev people.

Cool maths

Regularisation, function approximations, interesting manifold inference.

Even the stuff I’d assumed was trivial like backpropagation has a few wrinkles in practice.
See
Michael Nielson’s chapter and
Chrisopher Olah’s visual summary

Insight into the mind

TBD. Maybe.

Trippy art projects

See next.

Generative art applications

Most neural networks are invertible, giving you generative models.
(e.g.
run the model forwards, it recognises melodies;
run it “backwards”, it composes melodies.

It’s not quite running it backwards, in this vein, the “deep dreaming” project does this.
See, say, the above image from
google’s tripped-out image recognition systems) or
Gatys, Ecker and Bethge’s deep art
Neural networks do Monet quite well.
I’ve a weakness for ideas that give me plausible deniability for making
generative art while doing my maths homework.

Hip keywords for NN models

Not necessarily mutually exclusive;
some design patterns you can use.

See Tomasz Malisiewicz’s summary of Deep Learning Trends @ ICLR 2016

Adversarial

Train two networks to beat each other.
I have some intuitiuons why this might work, but need to learn more.

Convolutional

Signal processing baked in to neural networks. Not so complicated if you have ever done signal processing, apart from the abstruse use of “depth” to mean 2 different things in the literature.

Generally uses FIR filters plus some smudgy “pooling”
(which is nonlinear downsampling),
although IIR is also making an appearance by running RNN on multiple axes.

Terence broad go you

Spike-based

Most simulated neural networks are based on a continuous activation potential and discrete time, unlike spiking biological ones, which are driven by discrete events in continuous time.
There are a great many other differences.
What difference does this in particular make?
I suspect it make a difference regarding time.

Recurrent neural networks

Feedback neural networks with memory and therefore a notion of time and state.
As someone who does a lot of signal processing for music, the notion that these generalise linear systems theory is suggestive of lots of interesting DSP applications.

The connection with these and convolutional neural networks is suggestive for the same reason.

Vanilla

The main problem here is that they are unstable in the training phase unless you are clever.
See BeSF94. One solution is LSTM; see next.

Gate Recurrent Unit (GRU)

TBD

Long Short Term Memory (LSTM)

LSTM Networks for Sentiment Analysis:

In a traditional recurrent neural network, during the gradient back-propagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of timesteps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.[…]

These issues are the main motivation behind the LSTM model which introduces a new structure called a memory cell…]. A memory cell is composed of four main elements: an input gate, a neuron with a self-recurrent connection (a connection to itself), a forget gate and an output gate. […]The gates serve to modulate the interactions between the memory cell itself and its environment.

Cortical learning algorithms

Is this a real thing, or pure hype? How does it distinguish itself from other deep learning techniques aside from name-checking biomimetic engineering?
NuPIC has made a big splash with their open source brain-esque learning, and have open-sourced it;
on that basis alone looks like it could be fun to explore.

Extreme learning machines

Dunno.

Optimisation methods

TBD

Related questions

  • Artificial neural network are usually layers of linear projections
    sandwiched between saturating nonlinear maps.
    Why not more general nonlinearities?.
  • Can you know in advance how long it will take to fit a classifier
    or regression model for data of a given sort?
    The process looks so mechanical…

Regularisation in neural networks

L_1, L_2, dropout…

Compression of neural networks

It seems we should be able to do better than a gigantic network with millions of parameters;
Once we have trained the graph, how can we simplify it, compress it, or prune it?

Quantizing to single bits.

Encoding for neural networks

Neural networks take an inconvenient encoding format,
so general data has to be massaged.
Convolutional models are an important implicit encoding;
what else can we squeeze [in there/out of there]?

Software stuff

Too many. Neural networks are intuitive enough that everyone builds their own library.

I use Tensorflow, plus a side order of Keras.

  • R/MATLAB/Python/everything: MXNET.

  • Lua: Torch

  • MATLAB/Python: Caffe claims to be a “de facto standard”

  • Python: Theano

  • Python/C++: tensorflow seems to be the same thing as Theano,
    but it’s backed by google so probably has better long-term prospects.
    The construction of graphs is more explicit than in Theano, which I find easier to understand, although this means that you use the near-python syntax of Theano.
    Also claims to compile to smartphones etc, although that looks buggy atm.

  • Javascript (!) inference and training: convnetjs
    * plus bonus interview
    * sister project for recurrent networks: recurrentjs

  • synapticjs is a very full-feature javasceript training, inference and visualisation of neural network, with really good documentation. Great learning resource, with plausible examples.

  • javascript inference only, neocortexjt in the browser. Civilised.

  • brainjs is unmaintained now but looked like a nice simple javascript neural netowrk library.

  • mindjs is a simple one where you can see the moving parts.

  • iphone: DeepBeliefSDK

Examples

Howtos

To read

Refs

Amar98
Amari, S. (1998) Natural Gradient Works Efficiently in Learning. Neural Computation, 10(2), 251–276. DOI.
Arau00
Araujo, L. (2000) Evolutionary parsing for a probabilistic context free grammar. In Proc. of the Int. Conf. on on Rough Sets and Current Trends in Computing (RSCTC-2000), Lecture Notes in Computer Science 2005 (p. 590).
ArRK10
Arel, I., Rose, D. C., & Karnowski, T. P.(2010) Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier]. IEEE Computational Intelligence Magazine, 5(4), 13–18. DOI.
AGMM15
Arora, S., Ge, R., Ma, T., & Moitra, A. (2015) Simple, Efficient, and Neural Algorithms for Sparse Coding. arXiv:1503.00778 [cs, Stat].
BLPB12
Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., Bergeron, A., … Bengio, Y. (2012) Theano: new features and speed improvements. arXiv:1211.5590 [cs].
Beng09
Bengio, Y. (2009) Learning deep architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1–127. DOI.
BeCV13
Bengio, Y., Courville, A., & Vincent, P. (2013) Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Machine Intell., 35, 1798–1828. DOI.
BeLe07
Bengio, Y., & LeCun, Y. (2007) Scaling learning algorithms towards AI. Large-Scale Kernel Machines, 34, 1–41.
BeSF94
Bengio, Y., Simard, P., & Frasconi, P. (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. DOI.
Bose91
Boser, B. (1991) An analog neural network processor with programmable topology. J. Solid State Circuits, 26, 2017–2025. DOI.
Bott14
Bottou, L. (2014) From machine learning to machine reasoning. Mach. Learn., 94, 133–149. DOI.
BoBV12
Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012) Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In 29th International Conference on Machine Learning.
Cadi14
Cadieu, C. F.(2014) Deep neural networks rival the representation of primate it cortex for core visual object recognition. PLoS Comp. Biol., 10, e1003963. DOI.
CHMB15
Choromanska, A., Henaff, Mi., Mathieu, M., Ben Arous, G., & LeCun, Y. (2015) The Loss Surfaces of Multilayer Networks. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (pp. 192–204).
Ciod12
Ciodaro, T. (2012) Online particle detection with neural networks based on topological calorimetry information. J. Phys. Conf. Series, 368, 012030. DOI.
Cire12
Ciresan, D. (2012) Multi-column deep neural network for traffic sign classification. Neural Networks, 32, 333–338. DOI.
Dahl12
Dahl, G. E.(2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process., 20, 33–42. DOI.
DoSB14
Dosovitskiy, A., Springenberg, J. T., & Brox, T. (2014) Learning to Generate Chairs with Convolutional Neural Networks. arXiv:1411.5928 [cs].
Fara13
Farabet, C. (2013) Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell., 35, 1915–1929. DOI.
Fell91
Felleman, D. J.(1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex, 1, 1–47. DOI.
Fuku82
Fukushima, K. (1982) Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition, 15, 455–469. DOI.
Garc04
Garcia, C. (2004) Convolutional face finder: a neural architecture for fast and robust face detection. IEEE Trans. Pattern Anal. Machine Intell., 26, 1408–1423. DOI.
GaEB15
Gatys, L. A., Ecker, A. S., & Bethge, M. (2015) A Neural Algorithm of Artistic Style. arXiv:1508.06576 [cs, Q-Bio].
GiSB14
Giryes, R., Sapiro, G., & Bronstein, A. M.(2014) On the Stability of Deep Networks. arXiv:1412.5896 [cs, Math, Stat].
Hads09
Hadsell, R. (2009) Learning long-range vision for autonomous off-road driving. J. Field Robot., 26, 120–144. DOI.
HaCL06
Hadsell, R., Chopra, S., & LeCun, Y. (2006) Dimensionality Reduction by Learning an Invariant Mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 1735–1742). DOI.
Helm13
Helmstaedter, M. (2013) Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature, 500, 168–174. DOI.
Hint10
Hinton, G. (2010) A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade (Vol. 9, p. 926). Springer Berlin Heidelberg
HDYD12
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., … Kingsbury, B. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. DOI.
Hint95
Hinton, G. E.(1995) The wake-sleep algorithm for unsupervised neural networks. Science, 268, 1558–1161. DOI.
Hint07
Hinton, G. E.(2007) To recognize shapes, first learn to generate images. In T. D. and J. F. K. Paul Cisek (Ed.), Progress in Brain Research (Vol. Volume 165, pp. 535–547). Elsevier
HiSa06
Hinton, G. E., & Salakhutdinov, R. R.(2006) Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507. DOI.
HiOT06
Hinton, G., Osindero, S., & Teh, Y. (2006) A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527–1554. DOI.
HoSc97
Hochreiter, S., & Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. DOI.
HuSi05
Huang, G.-B., & Siew, C.-K. (2005) Extreme learning machine with randomly assigned RBF kernels. International Journal of Information Technology, 11(1), 16–24.
HuWL11
Huang, G.-B., Wang, D. H., & Lan, Y. (2011) Extreme learning machines: a survey. International Journal of Machine Learning and Cybernetics, 2(2), 107–122. DOI.
HuZS04
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings (Vol. 2, pp. 985–990 vol.2). DOI.
HuZS06
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006) Extreme learning machine: Theory and applications. Neurocomputing, 70(1–3), 489–501. DOI.
Hube62
Hubel, D. H.(1962) Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J. Physiol., 160, 106–154. DOI.
HuPC15
Hu, T., Pehlevan, C., & Chklovskii, D. B.(2015) A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization. arXiv:1503.00690 [cs, Q-Bio, Stat].
KaRL10
Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2010) Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition. arXiv:1010.3467 [cs].
KWKT15
Kulkarni, T. D., Whitney, W., Kohli, P., & Tenenbaum, J. B.(2015) Deep Convolutional Inverse Graphics Network. arXiv:1503.03167 [cs].
Lawr97
Lawrence, S. (1997) Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Networks, 8, 98–113. DOI.
Lecu98
LeCun, Y. (1998) Gradient-based learning applied to document recognition. Proc. IEEE, 86, 2278–2324. DOI.
LeBH15
LeCun, Y., Bengio, Y., & Hinton, G. (2015) Deep learning. Nature, 521(7553), 436–444. DOI.
LCHR06
LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006) A tutorial on energy-based learning. Predicting Structured Data.
LBRN07
Lee, H., Battle, A., Raina, R., & Ng, A. Y.(2007) Efficient sparse coding algorithms. Advances in Neural Information Processing Systems, 19, 801.
LGRN00
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y.(n.d.) Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. . Presented at the Proceedings of the 26th International Confer- ence on Machine Learning, 2009
Leun14
Leung, M. K.(2014) Deep learning of the tissue-regulated splicing code. Bioinformatics, 30, i121–i129. DOI.
Ma15
Ma, J. (2015) Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model., 55, 263–274. DOI.
Mall12
Mallat, S. (2012) Group Invariant Scattering. Communications on Pure and Applied Mathematics, 65(10), 1331–1398. DOI.
Mall16
Mallat, S. (2016) Understanding Deep Convolutional Networks. arXiv:1601.04920 [cs, Stat].
MaMD14
Marcus, G., Marblestone, A., & Dean, T. (2014) Neuroscience The atoms of neural computation. Science (New York, N.Y.), 346(6209), 551–552. DOI.
MCCD13
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs].
MiLS13
Mikolov, T., Le, Q. V., & Sutskever, I. (2013) Exploiting Similarities among Languages for Machine Translation. arXiv:1309.4168 [cs].
Mnih15
Mnih, V. (2015) Human-level control through deep reinforcement learning. Nature, 518, 529–533. DOI.
Moha12
Mohamed, A.-R. (2012) Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process., 20(1), 14–22. DOI.
Mont14
Montufar, G. (2014) When does a mixture of products contain a product of mixtures?. J. Discrete Math., 29, 321–347. DOI.
Ning05
Ning, F. (2005) Toward automatic phenotyping of developing embryos from videos. IEEE Trans. Image Process., 14, 1360–1371. DOI.
OlFi96a
Olshausen, B. A., & Field, D. J.(1996a) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609. DOI.
OlFi96b
Olshausen, B. A., & Field, D. J.(1996b) Natural image statistics and efficient coding. Network (Bristol, England), 7(2), 333–339. DOI.
OlFi04
Olshausen, B. A., & Field, D. J.(2004) Sparse coding of sensory inputs. Current Opinion in Neurobiology, 14(4), 481–487. DOI.
PaVe14
Paul, A., & Venkatasubramanian, S. (2014) Why does Deep Learning work? - A perspective from Group Theory. arXiv:1412.6621 [cs, Stat].
PeCh15
Pehlevan, C., & Chklovskii, D. B.(2015) A Hebbian/Anti-Hebbian Network Derived from Online Non-Negative Matrix Factorization Can Cluster and Discover Sparse Features. arXiv:1503.00680 [cs, Q-Bio, Stat].
RaBC08
Ranzato, M. aurelio, Boureau, Y. -la., & Cun, Y. L.(2008) Sparse Feature Learning for Deep Belief Networks. In J. C. Platt, D. Koller, Y. Singer, & S. T. Roweis (Eds.), Advances in Neural Information Processing Systems 20 (pp. 1185–1192). Curran Associates, Inc.
Ranz13
Ranzato, M. (2013) Modeling natural images using gated MRFs. IEEE Trans. Pattern Anal. Machine Intell., 35, 2206–2222. DOI.
Rume86
Rumelhart, D. E.(1986) Learning representations by back-propagating errors. Nature, 323, 533–536. DOI.
SGAL14
Sagun, L., Guney, V. U., Arous, G. B., & LeCun, Y. (2014) Explorations on high dimensional landscapes. arXiv:1412.6615 [cs, Stat].
Schw07
Schwenk, H. (2007) Continuous space language models. Computer Speech Lang., 21, 492–518. DOI.
SiOl01
Simoncelli, E. P., & Olshausen, B. A.(2001) Natural Image Statistics and Neural Representation. Annual Review of Neuroscience, 24(1), 1193–1216. DOI.
SDBR14
Springenberg, J. T., Dosovitskiy, A., Brox, T., & Riedmiller, M. (2014) Striving for Simplicity: The All Convolutional Net. arXiv:1412.6806 [cs].
Tura10
Turaga, S. C.(2010) Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Comput., 22, 511–538. DOI.
Waib89
Waibel, A. (1989) Phoneme recognition using time-delay neural networks. IEEE Trans. Acoustics Speech Signal Process., 37, 328–339. DOI.
WiBö15
Wiatowski, T., & Bölcskei, H. (2015) A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction. arXiv:1512.06293 [cs, Math, Stat].
Xion15
Xiong, H. Y.(2015) The human splicing code reveals new insights into the genetic determinants of disease. Science, 347, 6218. DOI.
ZhCL14
Zhang, S., Choromanska, A., & LeCun, Y. (2014) Deep learning with Elastic Averaging SGD. arXiv:1412.6651 [cs, Stat].

See original: The Living Thing / Notebooks Artificial neural networks