Generative art
Tue, 08/12/2015  2:48pm  by dan mackinlayIf you want the lowbrow version of this header, try “creative code”.
Either way, it means, more or less, “using algorithms to make pretty things.”
If you’ve seen a CGI film in the last 20 years, you’ve seen this.
Flocking, Lsystems,
agents, evolutionary systems,
pattern formation
and so on.
My interest here reflects my High Art, pontifical sensibility.
But video games are totes sick too, if that’s your bag.
Also 3d printing, augmented reality blah blah.
But you can google all that stuff without my help.
Here is stuff I frequently refer to.
Missing from here: prehistory of such art, early software art and precomputer algorithmic art.
Later, I will raid Neil Jenkins‘ excellent
garden of forking paths for some pointers.
Examples of praxis
 make art not apps
 abandoned art by zenbullets
 postspectacular
 precious forever
 Creative applications
 icarus
 Daniel Jones
 Jonathan McCabe
 Ollie Bown
 runme is an echo from another time: “…a software art repository, launched in January 2003. It is an open, moderated database to which people are welcome to submit projects they consider to be interesting examples of software art.”
Praxis yourself why dont you?
 supercollider
 javascript audio
 processing
 school of machines making and makebelieve
 art python
 art lisp
 visuals
 openframeworks, cinder, etc
I praxis myself
 pattern machine, my live electronic AV ensemble
 parking sun, my generative music remix project (see also my soundcloud)
 feral, my generative iphone app for imaginary mechanicotropical jungles
 synestizer
Here’s how you might do that with neural networks
Alex Graves on RNN predictive synthesis: https://www.youtube.com/watch?v=yX1SYeDHbg
Matt Vitelli on music generation https://www.youtube.com/watch?v=0VTI1BBLydE https://github.com/MattVitelli/GRUV
Adversarial generation is a cool hack if you hate boring stuff like labelling data sets https://github.com/goodfeli/adversarial
chair generation http://www.cvfoundation.org/openaccess/content_cvpr_2015/papers/Dosovitskiy_Learning_to_Generate_2015_CVPR_paper.pdf
 Denton, E., Chintala, S., Szlam, A., & Fergus, R. (2015). Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. arXiv:1506.05751 [cs].
 Dosovitskiy, A., Springenberg, J. T., Tatarchenko, M., & Brox, T. (2014). Learning to Generate Chairs, Tables and Cars with Convolutional Networks. arXiv:1411.5928 [cs].
 Goodfellow, I. J., PougetAbadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., … Bengio, Y. (2014). Generative Adversarial Networks. arXiv:1406.2661 [cs, Stat].
 Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015). DRAW: A Recurrent Neural Network For Image Generation. arXiv:1502.04623 [cs].
 Lazaridou, A., Nguyen, D. T., Bernardi, R., & Baroni, M. (2015). Unveiling the Dreams of Word Embeddings: Towards LanguageDriven Image Generation. arXiv:1506.03500 [cs].
 Theis, L., & Bethge, M. (2015). Generative Image Modeling Using Spatial LSTMs. arXiv:1506.03478 [cs, Stat].
 Wu, Q., Shen, C., Hengel, A. van den, Liu, L., & Dick, A. (2015). What value high level concepts in vision to language problems?. arXiv:1506.01144 [cs].
General reading
 Data is nature
 Mitchell Whitelaw and his amazing teeming void
 Boden, M. A., & Edmonds, E. A.(2009). What is generative art?. Digital Creativity, 20(12), 21–46. DOI.
 BoulangerLewandowski, N., Bengio, Y., & Vincent, P. (2012). Modeling Temporal Dependencies in HighDimensional Sequences: Application to Polyphonic Music Generation and Transcription. In 29th International Conference on Machine Learning.
 Bown, O. (2009). A Framework for EcoSystemBased Generative Music. Proceedings of the SMC, 195–200.
 Bown, O. (2009). Ecosystem models for realtime generative music: A methodology and framework. Ann Arbor, MI: MPublishing, University of Michigan Library.
 Bown, O. (2011). Experiments in modular design for the creative composition of live algorithms. Computer Music Journal, 35(3), 73–85.
 Bown, O., & Lexer, S. (2006). ContinuousTime Recurrent Neural Networks for Generative and Interactive Musical Performance. In F. Rothlauf, J. Branke, S. Cagnoni, E. Costa, C. Cotta, R. Drechsler, … H. Takagi (Eds.), Applications of Evolutionary Computing (pp. 652–663). Springer Berlin Heidelberg.
 Bown, O., & McCormack, J. (2010). Taming nature: tapping the creative potential of ecosystem models in the arts. Digital Creativity, 21(4), 215–231. DOI.
 Bown, O., McCormack, J., & Kowaliw, T. (2011). Ecosystemic methods for creative domains: Niche construction and boundary formation. In 2011 IEEE Symposium on Artificial Life (ALIFE) (pp. 132–139). DOI.
 Brown, P. (2003). Generative computation and the arts. Digital Creativity, 14(1), 1–2. DOI.
 Collins, N. (2006). Towards Autonomous Agents for Live Computer Music: Realtime Machine Listening and Interactive Music Systems.
 Denton, E., Chintala, S., Szlam, A., & Fergus, R. (2015). Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. arXiv:1506.05751 [cs].
 Dorin, A., McCabe, J., McCormack, J., Monro, G., & Whitelaw, M. (2012). A framework for understanding generative art. Digital Creativity, 23(34), 239–259. DOI.
 Dosovitskiy, A., Springenberg, J. T., Tatarchenko, M., & Brox, T. (2014). Learning to Generate Chairs, Tables and Cars with Convolutional Networks. arXiv:1411.5928 [cs].
 Garcia, R. A.(2001). Growing sound synthesizers using evolutionary methods. Proceedings of ALMMA 2002 Workshop on Artificial Models for Musical Applications, 99–107.
 Goodfellow, I. J., PougetAbadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., … Bengio, Y. (2014). Generative Adversarial Networks. arXiv:1406.2661 [cs, Stat].
 Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015). DRAW: A Recurrent Neural Network For Image Generation. arXiv:1502.04623 [cs].
 Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality Reduction by Learning an Invariant Mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 1735–1742). DOI.
 Hinton, G. E., & Salakhutdinov, R. R.(2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507. DOI.
 Holtzman, S. R.(1981). Using Generative Grammars for Music Composition. Computer Music Journal, 5(1), 51–64. DOI.
 Hornby, G. S., & Pollack, J. B.(2001). The advantages of generative grammatical encodings for physical design. In Proceedings of the 2001 Congress on Evolutionary Computation, 2001 (Vol. 1, pp. 600–607–1). DOI.
 Jackendoff, R., & Lerdahl, F. (1981). Generative Music Theory and Its Relation to Psychology. Journal of Music Theory, 25(1), 45–90.
 Kowaliw, T., McCormack, J., & Dorin, A. (2011). An interactive electronic art system based on artificial ecosystemics (pp. 162–169). Presented at the Artificial Life (ALIFE), 2011 IEEE Symposium on.
 Lazaridou, A., Nguyen, D. T., Bernardi, R., & Baroni, M. (2015). Unveiling the Dreams of Word Embeddings: Towards LanguageDriven Image Generation. arXiv:1506.03500 [cs].
 Lutton, E. (2006). Evolution of fractal shapes for artists and designers. International Journal on Artificial Intelligence Tools, 15(04), 651–672. DOI.
 Martin, A., & Bown, O. (2013). The agent designer toolkit. In Proceedings of the 9th ACM Conference on Creativity & Cognition (pp. 386–387). ACM.
 McCormack, J. (2004). Generative modelling with timed Lsystems. In Proceedings of the First International Conference on Design, Computing and Cognition (Vol. 4, pp. 157–175). Cambridge, USA: Kluwer Academic Publishers.
 McCormack, J. (2009). The Evolution of Sonic Ecosystems. In Artificial life models in software. SpringerVerlag New York Inc.
 McCormack, J. (n.d.). Enhancing Creativity with Niche Construction.
 McCormack, J., & Bown, O. (2009). Lifes what you make: Niche construction and evolutionary art. Applications of Evolutionary Computing, 528–537.
 McCormack, J., Bown, O., Dorin, A., McCabe, J., Monro, G., & Whitelaw, M. (2013). Ten Questions Concerning Generative Computer Art. Leonardo, 47(2), 135–141. DOI.
 Monro, G. (2009). Emergence and Generative Art. Leonardo, 42(5), 476–477. DOI.
 Rohrmeier, M. (2011). Towards a generative syntax of tonal harmony. Journal of Mathematics and Music, 5(1), 35–53. DOI.
 Sorensen, A., & Gardner, H. (2010). Programming with time: cyberphysical programming with impromptu. In ACM Sigplan Notices (Vol. 45, p. 822). ACM Press. DOI.
 Steedman, M. J.(1984). A Generative Grammar for Jazz Chord Sequences. Music Perception: An Interdisciplinary Journal, 2(1), 52–77. DOI.
 Stiny, G., & Gips, J. (1971). Shape Grammars and the Generative Specification of Painting and Sculpture. In Proceedings of the Workshop on generalisation and multiple representation. Leicester: CiteSeerX.
 Theis, L., & Bethge, M. (2015). Generative Image Modeling Using Spatial LSTMs. arXiv:1506.03478 [cs, Stat].
 Whitelaw, B. (2011, April 20). Almost all YouTube views come from just 30% of films.
 Whitelaw, M. (2003). Morphogenetics: generative processes in the work of Driessens and Verstappen. Digital Creativity, 14(1), 43–53. DOI.
 Whitelaw, M. (2005). System stories and model worlds: A critical approach to generative art. Readme, 100, 135–154.
 Whitelaw, M. (2006). Metacreation: Art and Artificial Life. The MIT Press.
 Whitelaw, M. (2010). Space filling And SelfConstraint: Critical Case Studies in Generative Design. Architectural Theory Review, 15(2), 157–165. DOI.
 Whitelaw, M., Guglielmetti, M., & Innocent, T. (2009). Strange ontologies in digital culture. Computers in Entertainment (CIE), 7(1), 4.
 Wu, Q., Shen, C., Hengel, A. van den, Liu, L., & Dick, A. (2015). What value high level concepts in vision to language problems?. arXiv:1506.01144 [cs].
See original: Generative art
Boosting, bagging, voting
Tue, 08/12/2015  2:47pm  by dan mackinlayEnsemble methods.
Fast to train, fast to use. Get you results. May not get you answers.
So, like neural networks but you don’t need a server farm.
Jeremy kun: Why Boosting Doesn’t Overfit:
Boosting, which we covered in gruesome detail previously, has a natural
measure of complexity represented by the number of rounds you run the
algorithm for.
Each round adds one additional “weak learner” weighted vote.
So running for a thousand rounds gives a vote of a thousand weak learners.
Despite this, boosting doesn’t overfit on many datasets.
In fact, and this is a shocking fact, researchers observed that Boosting
would hit zero training error, they kept running it for more rounds, and the
generalization error kept going down!
It seemed like the complexity could grow arbitrarily without penalty.
[…] this phenomenon is a fact about voting schemes,
not boosting in particular.
Randoms trees, forests, jungles
 Awesome Random Forests
 how to do mmachine vision using random forests brought to you by the folks behind Kinect.
 Balog, M., & Teh, Y. W.(2015). The Mondrian Process for Machine Learning. arXiv:1507.05181 [cs, Stat].
 Bickel, P. J., Li, B., Tsybakov, A. B., van de Geer, S. A., Yu, B., Valdés, T., … Vaart, A. van der. (2006). Regularization in statistics. Test, 15(2), 271–344. DOI.
 Bickel, P. J., Li, B., Tsybakov, A. B., van de Geer, S. A., Yu, B., Valdés, T., … van der Vaart, A. (2006). Regularization in statistics. Test, 15(2), 271–344. DOI.
 Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. DOI.
 Bühlmann, P. (2011). Statistics for HighDimensional Data: Methods, Theory and Applications (2011 edition.). Heidelberg ; New York: Springer.
 Criminisi, A., Shotton, J., & Konukoglu, E. (2011). Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and SemiSupervised Learning. (No. MSRTR2011114). Microsoft Research.
 Criminisi, A., Shotton, J., & Konukoglu, E. (2012). Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and SemiSupervised Learning. Foundations and Trendsrm in Computer Graphics and Vision: Vol. 7: No 23, Pp 81227.
 DíazAvalos, C., Juan, P., & Mateu, J. (2012). Similarity measures of conditional intensity functions to test separability in multidimensional point processes. Stochastic Environmental Research and Risk Assessment, 27(5), 1193–1205. DOI.
 FernándezDelgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do We Need Hundreds of Classifiers to Solve Real World Classification Problems?. Journal of Machine Learning Research, 15(1), 3133–3181.
 Friedman, J. H.(2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5), 1189–1232.
 Friedman, J. H.(2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378. DOI.
 Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors). The Annals of Statistics, 28(2), 337–407. DOI.
 Gall, J., & Lempitsky, V. (2013). ClassSpecific Hough Forests for Object Detection. In A. Criminisi & J. Shotton (Eds.), Decision Forests for Computer Vision and Medical Image Analysis (pp. 143–157). Springer London.
 Johnson, R., & Zhang, T. (2014). Learning Nonlinear Functions Using Regularized Greedy Forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 942–954. DOI.
 Lakshminarayanan, B., Roy, D. M., & Teh, Y. W.(2014). Mondrian Forests: Efficient Online Random Forests. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27 (pp. 3140–3148). Curran Associates, Inc.
 Lubinski, D., & Humphreys, L. G.(1996). Seeing the forest from the trees: When predicting the behavior or status of groups, correlate means.. Psychology, Public Policy, and Law, 2(2), 363–376. DOI.
 Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S.(1998). Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651–1686. DOI.
 Scornet, E. (2014). On the asymptotics of random forests. arXiv:1409.2090 [math, Stat].
 Scornet, E., Biau, G., & Vert, J.P. (2014). Consistency of Random Forests. arXiv:1405.2881 [math, Stat].
 Shotton, J., Sharp, T., Kohli, P., Nowozin, S., Winn, J., & Criminisi, A. (2013). Decision Jungles: Compact and Rich Models for Classification. In Proc. NIPS.
See original: Boosting, bagging, voting
Ecological fallacies
Tue, 08/12/2015  5:04am  by dan mackinlay“With great spreadsheets comes great responsibility.”
The danger of folk statistics.
The problems of excluded variables.
Avoidance of Ecological fallacy in meanfield approximation.
Simpson’s paradox.
Spurious correlation induced by sampling bias
See also graphical models,
hierarchical models.
See original: Ecological fallacies
Bandit problems, reinforcement learning, and stochastic control
Mon, 07/12/2015  3:35pm  by dan mackinlayBandit problems, Markov decision problems, a smattering of dynamic programming,
game theory, and online learning the solutions to such problems.
Clickbait bandit problems
On the science of treating consumers of modern news media like what they are,
nearly passive objects of surveillance and control.
Because trying to rely on peoples’ rationality and agency to get things done has
a poor track record in recent history.
Practically, the state of the art here is AFAICT a class of bandit problems.
New tool by microsoft: MultiWorld Testing (MWT)
… is a toolbox of machine learning technology for principled and efficient
experimentation, plausibly applicable to most Microsoft services that
interact with customers. In many scenarios, this technology is exponentially
more efficient than the traditional A/B testing. The underlying research
area, mature and yet very active, is known under many names: “multiarmed
bandits”, “contextual bandits”, “associative reinforcement learning”, and
“counterfactual evaluation”, among others.To take an example, suppose one wants to optimize clicks on suggested news
stories. To discover what works, one needs to explore over the possible news
stories. Further, if the suggested news story can be chosen depending on the
visitor’s profile, then one needs to explore over the possible “policies”
that map profiles to news stories (and there are exponentially more
“policies” than news stories!). Traditional ML fails at this because it does
not explore. Whereas MWT allows you to explore continuously, and optimize
your decisions using this exploration data.
(partial MWT source code)
The “bandit problems” phrase comes, by the way, from an extension of the “one
armed bandit”, the poker machine, into a mathematical model for exploring the
world through pulling on the arms of a poker machine.
There is a pleasing symmetry in that modern poker machines, and indeed the
internet in gerneal, model the customer as a machine upon whose arm they pull
to get a reward, and that this reward is addicting the customer to pulling on
the arms of their poker machine.
You should read this before you next blame someone
(especially a millenial, especially if you are not a millenial)
for having no attention span, then take a deep look into your soul;
Michael Schulson, if the internet is addictive, why don’t we regulate it?
As a consultant to Silicon Valley startups, Eyal helps his clients mimic what
he calls the ‘narcoticlike properties’ of sites such as Facebook and
Pinterest.
His goal, Eyal told Business Insider, is to get users ‘continuing through the
same basic cycle.
Forever and ever.’[…]
There are differences between a slot machine and a website, of course.
With the former, the longer you’re engaged by variable rewards, the more
money you lose.
For a tech company in the attention economy, the longer you’re engaged by
variable rewards, the more time you spend online, and the more money they
make through ad revenue.Yet we keep blaming people.
As Schüll puts it:
‘It just seems very duplicitous to design with the goal of capturing
attention, and then to put the whole burden onto the individual.’
Stupid rats, running the mazes we set them instead of dotcom startups.
Also, there’s interesting mathematics!
social graphs!
selfexciting point processes! And all the bandit problem literature!
 Sergey Feldman, Bandits for Recommendation Systems is an EZintroduction.
Markov decision problems
Bellman and Howard’s classic discrete time control stochastic problem
* http://www.castlelab.princeton.edu/ORF569papers/Powell_ADP_2ndEdition_Chapter%203.pdf
POMDP
Too many CPU cycles?
“A POMDP is a partially observable Markov decision process. It is a model, originating in the operations research (OR) literature, for describing planning tasks in which the decision maker does not have complete information as to its current state. The POMDP model provides a convenient way of reasoning about tradeoffs between actions to gain reward and actions to gain information.”
Reinforcement learning
To read
 Anders, T., & Miranda, E. R.(2009). A computational model that generalises Schoenberg’s guidelines for favourable chord progressions. In Proceedings of the Sound and Music Computing Conference. Citeseer.
 Bellman, R. (1957). A Markovian decision process. DTIC Document.
 Bellman, R. (1957b). E. 1957. Dynamic Programming. Princeton UniversityPress. BellmanDynamic programming1957.
 Bellman, R., & Kalaba, R. (1961). A note on interrupted stochastic control processes. Information and Control, 4(4), 346–349. DOI.
 Bottou, L., Peters, J., QuiñoneroCandela, J., Charles, D. X., Chickering, D. M., Portugaly, E., … Snelson, E. (2013). Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising. Journal of Machine Learning Research, 14, 3207–3260.
 CesaBianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge ; New York: Cambridge University Press.
 Dayan, P., & Watkins, C. J.(n.d.). Reinforcement LEarning. In Encyclopedia of Cognitve Science.
 Drummond, C. (2011). Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks. Journal Of Artificial Intelligence Research. DOI.
 Howard, R. A.(1960). DYNAMIC PROGRAMMING AND MARKOV PROCESSES...
 Kaelbling, L. P., Littman, M. L., & Moore, A. W.(1996). Reinforcement Learning: A Survey. Journal of Artifical Intelligence Research, 4.
 Li, L., Chen, S., Kleban, J., & Gupta, A. (2015). Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study. In Proceedings of the 24th International World Wide Web Conference (WWW’14), Companion Volume. ACM – Association for Computing Machinery.
 Li, L., Chu, W., Langford, J., & Wang, X. (2011). Unbiased Offline Evaluation of Contextualbanditbased News Article Recommendation Algorithms. In Proceedings of the Fourth International Conference on Web Search and Web Data Mining (WSDM11) (pp. 297–306).
 Powell, W. B.(2009). What you should know about approximate dynamic programming. Naval Research Logistics (NRL), 56(3), 239–249.
 Shibata, T., Yoshinaka, R., & Chikayama, T. (2006). Probabilistic Generalization of Simple Grammars and Its Application to Reinforcement Learning. In J. L. Balcázar, P. M. Long, & F. Stephan (Eds.), Algorithmic Learning Theory (pp. 348–362). Springer Berlin Heidelberg.
 Si, J. (2004). Handbook of learning and approximate dynamic programming (Vol. 2). John Wiley & Sons.
 Song, R., Xie, Y., & Pokutta, S. (2015). Sequential Information Guided Sensing. arXiv:1509.00130 [cs, Math, Stat].
 Strehl, A. L., Langford, J., Li, L., & Kakade, S. M.(2011). Learning from Logged Implicit Exploration Data. In Advances in Neural Information Processing Systems 23 (NIPS10) (pp. 2217–2225).
 Sutton, R. S., & Barto, A. G.(1998). Reinforcement learning. Cambridge, Mass.: MIT Press.
 Thrun, S. B.(1992). Efficient Exploration In Reinforcement Learning.
See original: Bandit problems, reinforcement learning, and stochastic control
Random fields
Mon, 07/12/2015  8:10am  by dan mackinlayAn area so broad it’s not so much a research field as a way of life.
See also point processess,
time series,
graphical models…
See original: Random fields
Random fields
Mon, 07/12/2015  8:10am  by dan mackinlayAn area so broad it’s not so much a research field as a way of life.
See also point processess,
time series,
graphical models,
spatial statistics …
To investigate: Random tree fields, Markov random fields, conditional random fields…
See original: Random fields
Branching processes
Mon, 07/12/2015  8:07am  by dan mackinlayA class of stochastic models,
certain types of generalisations of the Galton Watson process,
that I am mildly obsessed with.
There seem to be various subspecies.
TODO: notes on process defined on a multidimensional index set, i.e.
spacetime processes and branching random fields. (“cluster processes”)
(Taster over at spatial statistics
or random fields)
Discrete index, discrete state: The GaltonWatson process and friends
There are many standard expositions of this; I won’t write another here.
Two good ones:
 Gesine Reinert’s Introduction to Branching Processes: Part 1, Part 2.
 Steven Lalley’s intro
Generalised Galton Watson process
This section got long enough to break out separately.
See my notes on some generalisations of GaltonWatson process.
Continuous index, discrete state: the Hawkes Process
If you have a integervalued state space, but a continuous time
index, then this is a Hawkes Point Process.
The cluster point process
See my masters thesis.
Continous index, continuous state: The CSBP type of Lévy process
Aldous does a nononsense expo on these.
Super trendy at the moment:
t turns out that growing trees is connected in a
deep but purportedly simple way to “glueing together” excursions of random
processes, oh,
and a bunch of trippy fractals and random trees and stuff.
Too sleepy to explain THAT right now;
How about I pass out with the seminar still in my head then forget it instantly?
Lee and Hopcraft (LeHJ08) also found an analogous result for discrete state
branching processes.
Lamperti representation
Need to see if I can get my head around the forms of Lamperti representations.
Basically, the compensator is an a.s. positive process which gives us a time change, and the Lamperti representation gives us incredible universality for that.
The Lamperti representation goes for very general Lévy processes;
I can make do with much simpler ones for count data.
This is an example of a changeoftime result, also popular in point processes.
Discrete index, continuous state
Umm. Is this welldefined? I suppose so.
Can’t find any literature references though.
It surely has a fancy name.
“Marked GaltonWatson Process”?
Some kind of compound Poisson, I imagine.
Superprocesses
Measurevalued state or something?
Can’t recall, must investigate later.
 Dynkin, E. B.(1991). Branching Particle Systems and Superprocesses. The Annals of Probability, 19(3), 1157–1194. DOI.
 Dynkin, E. B.(2004). Superdiffusions and positive solutions of nonlinear partial differential equations. Providence, R.I: American Mathematical Society.
 Etheridge, A. (2000). An introduction to superprocesses. Providence, RI: American Mathematical Society.
To read
Aldo91: Aldous, D. (1991). The Annals of Probability The Continuum Random Tree. I, 19(1), 1–28. DOI.
Aldo93: Aldous, D. (1993). The Annals of Probability The Continuum Random Tree III, 21(1), 248–289. DOI.
Appl04: Applebaum, D. (2004). Notices of the AMS Lévy processesfrom probability to finance and quantum groups, 51(11), 1336–1347.
AtKe77: Athreya, K. B., & Keiding, N. (1977). Sankhyā: The Indian Journal of Statistics, Series A (19612002) Estimation Theory for ContinuousTime Branching Processes, 39(2), 101–123.
AtVi97: Athreya, K. B., & Vidyashankar, A. N.(1997). In K. B. Athreya & P. Jagers (Eds.), Classical and Modern Branching Processes Large Deviation Rates for Supercritical and Critical Branching Processes (pp. 1–18). Springer New York
BaDM12: Bacry, E., Dayri, K., & Muzy, J. F.(2012). The European Physical Journal B Nonparametric kernel estimation for symmetric Hawkes processes. Application to high frequency financial data, 85(5), 1–12. DOI.
BDHM13a: Bacry, E., Delattre, S., Hoffmann, M., & Muzy, J. F.(2013a). Quantitative Finance Modelling microstructure noise with mutually exciting point processes, 13(1), 65–77. DOI.
BDHM13b: Bacry, E., Delattre, S., Hoffmann, M., & Muzy, J. F.(2013b). Stochastic Processes and Their Applications Some limit theorems for Hawkes processes and application to financial statistics, 123(7), 2475–2499. DOI.
BaJM14: Bacry, E., Jaisson, T., & Muzy, J.F. (2014). arXiv:1412.7096 [qFin, Stat] Estimation of slowly decreasing Hawkes kernels: Application to high frequency order book modelling
BaMu14a: Bacry, E., & Muzy, J.F. (2014a). Quantitative Finance Hawkes model for price and trades highfrequency dynamics, 14(7), 1147–1166. DOI.
BaMu14b: Bacry, E., & Muzy, J.F. (2014b). arXiv:1401.0903 [physics, QFin, Stat] Second order statistics characterization of Hawkes processes and nonparametric estimation
Badd07: Baddeley, A. (2007). In W. Weil (Ed.), Stochastic Geometry Spatial Point Processes and their Applications (pp. 1–75). Springer Berlin Heidelberg
BhAd81: Bhat, B. R., & Adke, S. R.(1981). Advances in Applied Probability Maximum Likelihood Estimation for Branching Processes with Immigration, 13(3), 498–509. DOI.
BiSø95: Bibby, B. M., & Sørensen, M. (1995). Bernoulli Martingale Estimation Functions for Discretely Observed Diffusion Processes, 1(1/2), 17–39. DOI.
Bött13: Böttcher, B. (2013). Stochastics and Dynamics Feller evolution systems: Generators and approximation, 14(03), 1350025. DOI.
BrHe75: Brown, B. M., & Hewitt, J. I.(1975). Journal of Applied Probability Inference for the Diffusion Branching Process, 12(3), 588–594. DOI.
CaCh06: Caballero, M. E., & Chaumont, L. (2006). Journal of Applied Probability Conditioned Stable Lévy Processes and the Lamperti Representation, 43(4), 967–983.
CaGB13: Caballero, M. E., Garmendia, J. L. P., & Bravo, G. U.(2013). The Annals of Probability A Lampertitype representation of continuousstate branching processes with immigration, 41(3A), 1585–1627. DOI.
CaLB09: Caballero, M.E., Lambert, A., & Bravo, G. U.(2009). Probability Surveys Proof(s) of the Lamperti representation of ContinuousState Branching Processes, 6, 62–89. DOI.
Chis64: Chistyakov, V. (1964). Theory of Probability & Its Applications A Theorem on Sums of Independent Positive Random Variables and Its Applications to Branching Random Processes, 9(4), 640–648. DOI.
Çinl75: Çinlar, E. (1975). Management Science Exceptional Paper—Markov Renewal Theory: A Survey, 21(7), 727–752. DOI.
Cohn97: Cohn, H. (1997). In K. B. Athreya & P. Jagers (Eds.), Classical and Modern Branching Processes Stochastic Monotonicity and Branching Processes (pp. 51–56). Springer New York
CrSS10: Crane, R., Schweitzer, F., & Sornette, D. (2010). Physical Review E Power law signature of media exposure in human response waiting time distributions, 81(5), 056101. DOI.
CrDL99: Crisan, D., Del Moral, P., & Lyons, T. (1999). Markov Processes and Related Fields Discrete filtering using branching and interacting particle systems, 5(3), 293–318.
CuLe13: Curien, N., & Le Gall, J.F. (2013). Journal of Theoretical Probability The Brownian Plane, 27(4), 1249–1291. DOI.
DaVe03: Daley, D. J., & VereJones, D. (2003) An introduction to the theory of point processes (2nd ed., Vol. 1. Elementary theory and methods). New York: Springer
DaVe08: Daley, D. J., & VereJones, D. (2008) An introduction to the theory of point processes (2nd ed., Vol. 2. General theory and structure). New York: Springer
DaZh11: Dassios, A., & Zhao, H. (2011). Advances in Applied Probability A dynamic contagion process, 43(3), 814–846. DOI.
DeSp97: Dekking, F. M., & Speer, E. R.(1997). In K. B. Athreya & P. Jagers (Eds.), Classical and Modern Branching Processes On the Shape of the Wavefront of Branching Random Walk (pp. 73–88). Springer New York
DeMi00: Del Moral, P., & Miclo, L. (2000). In Séminaire de Probabilités XXXIV Branching and interacting particle systems approximations of FeynmanKac formulae with applications to nonlinear filtering (pp. 1–145). Springer
DeSo05: Deschâtres, F., & Sornette, D. (2005). Physical Review E Dynamics of book sales: Endogenous versus exogenous shocks in complex networks, 72(1), 016112. DOI.
DoKy06: Doney, R. A., & Kyprianou, A. E.(2006). The Annals of Applied Probability Overshoots and undershoots of Lévy processes, 16(1), 91–106. DOI.
DuPo15: Duembgen, M., & Podolskij, M. (2015). Stochastic Processes and Their Applications Highfrequency asymptotics for pathdependent functionals of Itô semimartingales, 125(4), 1195–1217. DOI.
Dynk91: Dynkin, E. B.(1991). The Annals of Probability Branching Particle Systems and Superprocesses, 19(3), 1157–1194. DOI.
Dynk04: Dynkin, E. B.(2004) Superdiffusions and positive solutions of nonlinear partial differential equations. Providence, R.I: American Mathematical Society
EmLL11: Embrechts, P., Liniger, T., & Lin, L. (2011). Journal of Applied Probability Multivariate Hawkes processes: an application to financial data, 48A, 367–378. DOI.
Ethe00: Etheridge, A. (2000) An introduction to superprocesses. Providence, RI: American Mathematical Society
Evan08: Evans, S. N.(2008) Probability and real trees (Vol. 1920). Berlin: Springer
FaTe12: Falkner, N., & Teschl, G. (2012). Expositiones Mathematicae On the substitution rule for Lebesgue–Stieltjes integrals, 30(4), 412–418. DOI.
Feig76: Feigin, P. D.(1976). Advances in Applied Probability Maximum Likelihood Estimation for ContinuousTime Stochastic Processes, 8(4), 712–736. DOI.
FBMS14: Filimonov, V., Bicchetti, D., Maystre, N., & Sornette, D. (2014). Journal of International Money and Finance Quantification of the high level of endogeneity and of structural regime shifts in commodity markets, 42, 174–192. DOI.
Flee14: Fleet, L. (2014). Nature Physics Networks: Improve your virality, 10(6), 415–415. DOI.
Gutt91: Guttorp, P. (1991) Statistical inference for branching processes. New York: Wiley
HaJV05: Haccou, P., Jagers, P., & Vatutin, V. A.(2005) Branching Processes: Variation, Growth, and Extinction of Populations. Cambridge: Cambridge University Press
HaBo13: Halpin, P. F., & Boeck, P. D.(2013). Psychometrika Modelling dyadic Interaction with Hawkes Processes, 78(4), 793–814. DOI.
HaBB13: Hardiman, S. J., Bercot, N., & Bouchaud, J.P. (2013). The European Physical Journal B Critical reflexivity in financial markets: a Hawkes process analysis, 86(10), 1–9. DOI.
HaBo14: Hardiman, S. J., & Bouchaud, J.P. (2014). Physical Review E Branchingratio approximation for the selfexciting Hawkes process, 90(6), 062807. DOI.
Hawk71: Hawkes, A. G.(1971). Biometrika Spectra of some selfexciting and mutually exciting point processes, 58(1), 83–90. DOI.
HaOa74: Hawkes, A. G., & Oakes, D. (1974). Journal of Applied Probability A cluster process representation of a selfexciting process, 11(3), 493. DOI.
HeSe10: Heyde, C. C., & Seneta, E. (2010). In R. Maller, I. Basawa, P. Hall, & E. Seneta (Eds.), Selected Works of C.C. Heyde Estimation Theory for Growth and Immigration Rates in a Multiplicative Process (pp. 214–235). Springer New York
IrMo11: Iribarren, J. L., & Moro, E. (2011). Physical Review E Branching dynamics of viral information spreading, 84(4), 046116. DOI.
Jaco97: Jacod, J. (1997). In J. Azéma, M. Yor, & M. Emery (Eds.), Séminaire de Probabilités XXXI On continuous conditional Gaussian martingales and stable convergence in law (pp. 232–246). Springer Berlin Heidelberg
JaPV10: Jacod, J., Podolskij, M., & Vetter, M. (2010). The Annals of Statistics Limit theorems for moving averages of discretized processes plus noise, 38(3), 1478–1545. DOI.
Jage69: Jagers, P. (1969). Arkiv För Matematik Renewal theory and the almost sure convergence of branching processes, 7(6), 495–504. DOI.
Jage97: Jagers, P. (1997). In K. B. Athreya & P. Jagers (Eds.), Classical and Modern Branching Processes Towards Dependence in General Branching Processes (pp. 127–139). Springer New York
Jáno07: János Engländer. (2007). Probability Surveys Branching diffusions, superdiffusions and random media, 4, 303–364. DOI.
00: János Engländer  2007  Branching diffusions, superdiffusions and random m.pdf. (n.d.)
Kest73: Kesten, H. (1973). Acta Mathematica Random difference equations and Renewal theory for products of random matrices, 131(1), 207–248. DOI.
KrPa14: Kraus, A., & Panaretos, V. M.(2014). Biometrika Frequentist estimation of an epidemic’s spreading potential when observations are scarce, 101(1), 141–154. DOI.
KvPa11: Kvitkovičová, A., & Panaretos, V. M.(2011). Advances in Applied Probability Asymptotic inference for partially observed branching processes, 43(4), 1166–1190. DOI.
LSTB15: Lakshmanan, K. C., Sadtler, P. T., TylerKabara, E. C., Batista, A. P., & Yu, B. M.(2015). Neural Computation Extracting LowDimensional Latent Structure from Time Series in the Presence of Delays, 27(9), 1825–1856. DOI.
Lamp67a: Lamperti, J. (1967a). Bull. Amer. Math. Soc Continuousstate branching processes, 73(3), 382–386.
Lamp67b: Lamperti, J. (1967b). Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete The Limit of a Sequence of Branching Processes, 7(4), 271–288. DOI.
LaDG09: Laredo, C., David, O., & Garnier, A. (2009). arXiv:0902.4520 [stat] Inference for Partially Observed Multitype Branching Processes and Ecological Applications
LaTP15: Laub, P. J., Taimre, T., & Pollett, P. K.(2015). arXiv:1507.02822 [math, QFin, Stat] Hawkes Processes
LeHJ08: Lee, W. H., Hopcraft, K. I., & Jakeman, E. (2008). Physical Review E Continuous and discrete stable processes, 77(1), 011109. DOI.
Lega05: Le Gall, J.F. (2005). Probability Surveys Random trees and applications, 2, 245–311. DOI.
Lega13: Le Gall, J.F. (2013). The Annals of Probability Uniqueness and universality of the Brownian map, 41(4), 2880–2960. DOI.
LeMi12: Le Gall, J.F., & Miermont, G. (2012). Probability and Statistical Physics in Two and More Dimensions Scaling limits of random trees and planar maps, 15, 155–211.
LeHe13: Levina, A., & Herrmann, J. M.(2013). Stochastics and Dynamics The Abelian distribution, 14(03), 1450001. DOI.
LeMo11: Lewis, E., & Mohler, G. (2011). Preprint A nonparametric EM algorithm for multiscale Hawkes processes
Lini09: Liniger, T. J.(2009) Multivariate Hawkes processes. Diss., Eidgenössische Technische Hochschule ETH Zürich, Nr. 18403, 2009
LiMy07: Li, Y., & Mykland, P. A.(2007). Bernoulli Are volatility estimators robust with respect to modeling assumptions?, 13(3), 601–622. DOI.
Li12: Li, Z. (2012). arXiv:1202.3223 [math] Continuousstate branching processes
Li14: Li, Z. (2014). The Annals of Probability Pathvalued branching processes and nonlocal branching superprocesses, 42(1), 41–79. DOI.
Li00: Li, Z.H. (2000). Journal of the Australian Mathematical Society (Series A) Asymptotic Behaviour of Continuous Time and State Branching Processes, 68(01), 68–84. DOI.
Lyon90: Lyons, R. (1990). The Annals of Probability Random Walks and Percolation on Trees, 18(3), 931–958. DOI.
Lyon11: Lyons, R. (2011) Probability on trees and networks
MaLe08: Marsan, D., & Lengliné, O. (2008). Science Extending earthquakes’ reach through cascading, 319(5866), 1076–1079. DOI.
Mein09: Meiners, M. (2009). Stochastic Processes and Their Applications Weighted branching and a pathwise renewal equation, 119(8), 2579–2597. DOI.
MSBS11: Mohler, G. O., Short, M. B., Brantingham, P. J., Schoenberg, F. P., & Tita, G. E.(2011). Journal of the American Statistical Association Selfexciting point process modeling of crime, 106(493), 100–108. DOI.
MoIm10: Motoike, I. N., & Imamura, H. T.(2010). Physical Review E Branching pattern formation that reflects the history of signal propagation, 82(4), 046205. DOI.
NaWa84: Nanthi, K., & Wasan, M. T.(1984). Stochastic Processes and Their Applications Branching processes, 18(2), 189. DOI.
Neut78: Neuts, M. F.(1978). Naval Research Logistics Quarterly Renewal processes of phase type, 25(3), 445–454. DOI.
Oake75: Oakes, D. (1975). Journal of Applied Probability The Markovian selfexciting process, 12(1), 69. DOI.
Ogat78: Ogata, Y. (1978). Annals of the Institute of Statistical Mathematics The asymptotic behaviour of maximum likelihood estimators for stationary point processes, 30(1), 243–261. DOI.
Ogat88: Ogata, Y. (1988). Journal of the American Statistical Association Statistical models for earthquake occurrences and residual analysis for point processes, 83(401), 9–27. DOI.
Ogat99: Ogata, Y. (1999). Pure and Applied Geophysics Seismicity analysis through pointprocess modeling: a review, 155(24), 471–507. DOI.
OgAk82: Ogata, Y., & Akaike, H. (1982). Journal of the Royal Statistical Society, Series B On linear intensity models for mixed doubly stochastic Poisson and selfexciting point processes, 44, 269–274. DOI.
Olof05: Olofsson, P. (2005) Probability, statistics, and stochastic processes. Hoboken, N.J: Hoboken, N.J. : WileyInterscience
Over98: Overbeck, L. (1998). Scandinavian Journal of Statistics Estimation for Continuous Branching Processes, 25(1), 111–126. DOI.
Ozak79: Ozaki, T. (1979). Annals of the Institute of Statistical Mathematics Maximum likelihood estimation of Hawkes’ selfexciting point processes, 31(1), 145–155. DOI.
PoVe10: Podolskij, M., & Vetter, M. (2010). Statistica Neerlandica Understanding limit theorems for semimartingales: a short survey: Limit theorems for semimartingales, 64(3), 329–351. DOI.
ReSc10: ReynaudBouret, P., & Schbath, S. (2010). The Annals of Statistics Adaptive estimation for Hawkes processes; application to genome analysis, 38(5), 2781–2822. DOI.
RIKK15: Ruan, Z., Iniguez, G., Karsai, M., & Kertesz, J. (2015). arXiv:1506.00251 [physics] Kinetics of Social Contagion
SaHS05: Saichev, A., Helmstetter, A., & Sornette, D. (2005). Pure and Applied Geophysics Powerlaw Distributions of Offspring and Generation Numbers in Branching Models of Earthquake Triggering, 162(67), 1113–1134. DOI.
SaSo10: Saichev, A. I., & Sornette, D. (2010). The European Physical Journal B Generationbygeneration dissection of the response function in long memory epidemic processes, 75(3), 343–355. DOI.
SaMS08: Saichev, A., Malevergne, Y., & Sornette, D. (2008). arXiv:0808.1828 [physics, QFin] Theory of Zipf’s law and of general power law distributions with gibrat’s law of proportional growth
SaSo11a: Saichev, A., & Sornette, D. (2011a). arXiv:1101.5564 [condMat, Physics:physics] Generating functions and stability study of multivariate selfexcited epidemic processes
SaSo11b: Saichev, A., & Sornette, D. (2011b). arXiv:1101.1611 [condMat, Physics:physics] Hierarchy of temporal responses of multivariate selfexcited epidemic processes
Seva68: Sevast’yanov, B. A.(1968). Mathematical Notes of the Academy of Sciences of the USSR Renewal equations and moments of branching processes, 3(1), 3–10. DOI.
SMSG10: Sood, V., Mathieu, M., Shreim, A., Grassberger, P., & Paczuski, M. (2010). Physical Review Letters Interacting branching process as a simple model of innovation, 105(17), 178701. DOI.
Sorn06: Sornette, D. (2006). In Extreme events in nature and society Endogenous versus exogenous origins of crises (pp. 95–119). Springer
SDGA04: Sornette, D., Deschâtres, F., Gilbert, T., & Ageon, Y. (2004). Physical Review Letters Endogenous versus exogenous shocks in complex networks: An empirical test using book sale rankings, 93(22), 228701. DOI.
SoHe03: Sornette, D., & Helmstetter, A. (2003). Physica A: Statistical Mechanics and Its Applications Endogenous versus exogenous shocks in systems with memory, 318(3–4), 577–591. DOI.
SoMM02: Sornette, D., Malevergne, Y., & Muzy, J. F.(2002). arXiv:condmat/0204626 Volatility fingerprints of large shocks: Endogeneous versus exogeneous
SoMM04: Sornette, D., Malevergne, Y., & Muzy, J.F. (2004). In H. Takayasu (Ed.), The Application of Econophysics Volatility fingerprints of large shocks: endogenous versus exogenous (pp. 91–102). Springer Japan
SoUt09: Sornette, D., & Utkin, S. (2009). Physical Review E Limits of declustering methods for disentangling exogenous from endogenous events in time series with foreshocks, main shocks, and aftershocks, 79(6), 061110. DOI.
VeSc08: Veen, A., & Schoenberg, F. P.(2008). Journal of the American Statistical Association Estimation of Space–Time Branching Process Models in Seismology Using an EM–Type Algorithm, 103(482), 614–624. DOI.
Wata68: Watanabe, S. (1968). Journal of Mathematics of Kyoto University A limit theorem of branching processes and continuous state branching processes, 8(1), 141–167.
Wein65: Weiner, H. J.(1965). The Annals of Mathematical Statistics An Integral Equation in Age Dependent Branching Processes, 36(5), 1569–1573. DOI.
YNRS08: Yaari, G., Nowak, A., Rakocy, K., & Solomon, S. (2008). The European Physical Journal B Microscopic study reveals the singular origins of growth, 62(4), 505–513. DOI.
ZhSi13: Zhao, Z., & Singer, A. (2013). Journal of the Optical Society of America A Fourier–Bessel rotational invariant eigenimages, 30(5), 871. DOI.
See original: Branching processes
Javascript visualisations
Fri, 04/12/2015  8:26am  by dan mackinlayJavascript statistical graphing
 the mothership, d3.js
 animation using velocity.js
 plot.ly is statistics oriented
3D visualisation
Yes, 3D in the browser is performance and convenient.
More so, IMO, than processing.
Twoc ommon options use OpenGL ES, the mobile and browser friendly option.
 Scenejs seems to specialise in loading up geometries and shapes
 three.js has some impressively performant demos
For desktop apps and a larger OpenGl subset there is a desktop option,
Plask which seems to be some kind of particlesystemfriendly, OSX app.
See original: Javascript visualisations
Javascript visualisations
Fri, 04/12/2015  8:26am  by dan mackinlayJavascript statistical graphing

the mothership, d3.js.

animation using velocity.js.

plot.ly is statistics oriented charting.

flot is also statistics oriented charting.

waveform graphs audio files for you

vega …
is a declarative format for creating, saving, and sharing visualization designs. With Vega, visualizations are described in JSON, and generate interactive views using either HTML5 Canvas or SVG.
vegalite claimts to be a ggplotlike layer atop it.
3D visualisation
Yes, 3D in the browser is performant and convenient.
More so, IMO, than Processing.
Two common options use OpenGL ES, the mobile and browser friendly option.

Scenejs seems to specialise in loading up geometries and shapes and physics for realistic scene modelling

three.js does the same things, but does more abstract stuff with them

is a WebGL framework for data visualization, creative coding and game development. It includes modules to manage scenes, cameras and textures and modules to work with effects, web workers and more.
However, it looks unmaintained.
Everything supports lense flare.
For desktop apps and a larger OpenGl subset there is a desktop option,
Plask which seems to be some kind of particlesystemfriendly, OSX app, with spurty development but spectacular potential.
See original: Javascript visualisations
Privacy (notes on how to have it)
Fri, 04/12/2015  5:36am  by dan mackinlayTechnoprivacy is difficult and tedious for our monkey minds to get a handle on.
However, it’s not too hard.
The trick is, don’t get hung up on thinking you are some kind of secret agent who needs
to hide from the NSA.
You are no Osama bin Laden.
To be fair, these days, even Osama bin Laden is not Osama bin Laden.
Deal with state surveillance through political means if you are worried about the state stealing your information.
(Or at least, work up gradually to truly paranoid privacy attitudes, and research more widely the tips here.)
Instead, for us normal people, the rule should be:
Start by not giving your information away for free to everyone.
And don’t simply give up because it’s too hard:
That’s just doing what big business wants you to do..
That said, just because I’m talking about what our attitude should be as
informed consumers of the addictive drug of singleserve online socialising,
doesn’t mean I’m blaming Jane/Joe Public for not getting it right.
As long as corporate socal networks are permitted to harness their heady blend
of plausiblydeniable social engineering on the vulnerable, we are all put at
greater risk.
Case in point:
A friend of mine just showed me his facebook profile public link before
friending me;
on open display were pictures of his children, his home, his friends, dying
relatives in hospital with confidential medical information and records in the
background;
With his wellintentioned handphone wielding he has volutarily compromised the
privacy, insurance and loanworthiness of everyone he knows who has confided in
him;
privacy is a weakestlink kind of concept, and as long as Facebook can rely on
a reasonable fraction of the population voluntarily and unconsciously selling
the rest out, we are all compromised.
I know that everything I do in front of this guy will be obediently tagged and
put on public display for the use of not only facebook but any passing mobster,
data miner or insurance company.
The thing is, it is not sufficient if privacyviolating companies are able to
get away with it if in principle experts could avoid some of the pitfalls;
Social media is a habitforming drug that transmits contagious ailments and
shouldn’t let companies get away with pretending they don’t know, any more than
we should let hospitals dispense unmedicated addictive drugs with dirty
syringes, or put poker machines in school playgrounds.
And companies, such as phone companies, that sell your information no matter what you do must be punished by the laws we haven’t enacted yet.
Anyway, with blame for the abuse appropriately apportioned to people other than the victims,
let’s get back to what we, the victims, can do by taking the responsibility avaialbe to us,
which is not so very hard,
for all that it should not be required of us.
Right now, if you are a typical internet user, you are walking around with no
pants on online.
Everyone can see your junk.
You don’t need to wear a tinfoil hat to hide your junk,
not if your anatomy is anything typical;
you just need to put some pants on.
This enpantsing will be more tedious than we’d like,
because the world is badly designed,
but let’s start with what’s achievable,
and work towards making it easier next
time, eh?
You do too have something to hide.
You commit three felonies a day
A statistical problem with “nothing to hide”:
How we could do it better now
So, some baby steps towards a healthier privacy regime.
I am going to list some
techniques that have aroused my attention.
Later I will triage them according to how urgent is the priority of the privacy
leak they plug and how onerous to handle; e.g. something like:
 first keep my credit card details out of the hands of the hands of the mafia, then
 keep gratuitous personal data out of the hands of unscrupulous corporations, next
 keep nude selfies and pony tail pics out of the hands of potential employers
 …
 keep personal data out of the hands of prying foreign security agencies
 keep personal data out of the hands of prying local security agencies
These reflect my personal needs;
if you are actually a person of specific
interest to state security agencies, or a mafia credit card thief, you will
probably have different ones.
General
 Prism break is a chaotic list of solutions.
Excellent reference, although it really needs to incorporate some idea of how
popular their suggested solutions are;
after all, most of these things are only of any damn use if your friends also
use ‘em.  quick guide to the basics of encryption (or how about one with stick figures)
 VPNs
 password managers
 tcpcrypt is a protocol that
attempts to encrypt (almost) all of your network traffic.
Unlike other
security mechanisms, Tcpcrypt works out of the box: it requires no
configuration, no changes to applications, and your network connections will
continue to work even if the remote end does not support Tcpcrypt, in which
case connections will gracefully fall back to standard cleartext TCP.
Install Tcpcrypt and you’ll feel no difference in your every day user
experience, but yet your traffic will be more secure and you’ll have made
life much harder for hackers.
Search engines and browsing
 search engines in general
 duckduckgo
 disconnect anonymises other search engines from their servers
 Advanced: run your own search anonymiser:
 mysearch  Local search engine portal designed to anonymate your search requests and have a better display of search results
A public instance is available at https://search.jesuislibre.net/  searx is the same, I think
 see secure servers for more options for VPN, P2P etc
 mysearch  Local search engine portal designed to anonymate your search requests and have a better display of search results
 firefox
 Privacy badger is an open source nonprofit lowconfiguration blocker of targetted advertising specifically
 torbrowser
 Adblock Edge, Ghostery, Disconnect, DoNotTrackMe, RequestPolicy
 chrome
 Privacy badger (see above) also works for firefox
 Ghostery disables most of the social media spyware.
 scriptsafe
 HTTPS everywhere is vexing.
Every browser should implement this functionality,
of being secure by default instead of writing your passwords on the lawn in
big letters anytime someone asks.
That’s why it’s annoying that you have to install a plugin to make it work.
And, worse, a horribly memoryhungry plugin.  adblock plus
 safari
 …?
 Smartphones
 Which android phones do not leave gaping unpatched security holes?. tl;dr  Google, LG, then everyone else.
 Running your own server?
See secure web servers.  Other tracking
Social networks
 don’t use them
 OK, in fact, not using them is harder than you’d like, because
 The No network effect means that all your
friends have forgotten how to manage their life without Facebook all up in
their shit, and anyway  if you log in to one of these damn things even once
you are surveilled in perpetuity by their ubiquitous browser tracking bullshit.
 The No network effect means that all your
 so, given that you are using social networks, minimise the risk
 and oh god if your friends start sharing pictures of you publicly for any
reason, block them. We need to set up a new social norm around not selling
each other downstream, until we can fix this clusterfuck.  Logins. Don’t login with facebook and google.
There might be better alternatives in the future (e.g. persona).
But for now, just don’t.
Synchronising files
See Synchronising files.
Chat
See chat.
See email.
Money
Sick of your financial data being used to find out things about you that even you didn’t know?
Try to pay cash or bitcoins. (Other alternatives?)
Bitcoins have a thriving,
sociallyawkard, tinfoilhattotin’ community, but are useful. (For example, they are the cheapest way to get money across the border in many places whihc is reasonably essential if you move as often as I do.)
They have lots of howto guides.
e.g.
Miscellaney
 I went to the same school as Julian Assange but we learned different lessons
 http://irevolution.net/2011/02/10/facebookforrepressiveregimes/
 http://ori.scs.stanford.edu/
 http://www.wired.com/2014/01/howtheusalmostkilledtheinternet/2/
 https://keybase.io/
 GNU privacy handbook
 I2P seems to be hot right now
 freenet is somewhat hot
How we could do it better later
OK, anyway, we shouldn’t all have to be diigtal privacy experts to survive in the 21st century; How could we change the rules so that we can focus on our day jobs?
(I give you permission to despair if you can do it amusingly,
I’d prefer amusingly with hope
Slamming PGP and the model of human behaviour it assumes is a cottage industry:

GPG and HTTPS (X509) are broken in usability terms because the conceptual
model of trust embedded in each network does not correspond to how people
actually experience the world.
As a result, there is a constant grind between people and these systems,
mainly showing up as a series of user interface disasters.
The GPG web of trust results in absurd social constructs like signing parties
because it does not work and creating social constructs that weird to support
it is a sign of that:
stand in a line and show 50 strangers your government ID to prove you exist?
Really?
Likewise, anybody who’s tried to buy an X509 certificate (HTTPS cert) knows
the process is absurd:
anybody who’s really determined can probably figure out how to fake your
details if they happen to be doing this before you do it for yourself, and of
the 1500 or so Certificate Authorities issuing trust credentials at least one
is weak or compromised by a State, and all your browser will tell you is
“yes, I trust this credential absolutely.”
You just don’t get any say in the matter at all.[…]
The best explanation of this in more detail is the Ode to the Granovetter
Diagram which shows how this different trust model maps cleanly to the
networks of human communication found by Mark Granovetter in his sociological
research.
We’re talking about building trust systems which correspond to actual trust
systems as they are found in the real world, not the broken military
abstractions of X509 or the flawed cryptoanarchy of GPG. 
When someone says “assume that a public key cryptosystem
exists,” this is roughly equivalent to saying “assume
that you could clone dinosaurs, and that you could fill a park
with these dinosaurs, and that you could get a ticket to this
‘Jurassic Park,’ and that you could stroll throughout this
park without getting eaten, clawed, or otherwise quantum
entangled with a macroscopic dinosaur particle.” 
http://blog.cryptographyengineering.com/2014/08/whatsmatterwithpgp.html
Getting old school
Academic stuff to read to stay paranoid

Genkin, D., Shamir, A., & Tromer, E. (2013). RSA Key Extraction via LowBandwidth Acoustic Cryptanalysis. Cryptology ePrint Archive, Report 2013/857, 2013. http://eprint.iacr.org. Online.
Yes, that’s right, deducing your password by listening to your computer.
But it gets worse:Beyond acoustics, we demonstrate that a similar lowbandwidth attack can be
performed by measuring the electric potential of a computer chassis.
A suitablyequipped attacker need merely touch the target computer with his
bare hand, or get the required leakage information from the ground wires at
the remote end of VGA, USB or Ethernet cables.Maybe don’t read this if you are working on reducing your background paranoia.

Roth, A. (2014). The Algorithmic Foundations of Differential Privacy. Now Publishers. Online.
The mathematical foundations of doing stuff privately.
I hope someone else is reading this so that I don’t have to. 
Sarigol, E., Garcia, D., & Schweitzer, F. (2014). Online Privacy as a Collective Phenomenon. arXiv:1409.6197 [cs]. Online.
Your friends have already disclosed secrets about you by disclosing they know
you on social media, secrets that will be further disseminated by random grad
students in Switzerland when the social media company goes bust.
Politics of privacy
Here’s a trick to get yourself to the correct level of consternation:
…a simple metric I use to assess the claims put forth by wannabe
surveillers:
simply relocate the argument from cyber to meatspace, and see how it holds
up.
For example, Leslie Caldwell’s forebodings about online
“zones of lawlessness”
would be rendered thusly:Caldwell also raised fresh alarms about curtains on windows and locks on
bathroom doors, both of which officials say make it easier for criminals to
hide their activity. “Bathroom doors obviously were created with good
intentions, but are a huge problem for law enforcement. There are a lot of
windowless basements and bathrooms where you can do anything from purchase
heroin to buy guns to hire somebody to kill somebody”
Practically, first step, I would like to minimise the
amount of information complete strangers get about me for free.
For example, I would prefer the mafia not to be able to buy stuff with my
credit cards, I’d prefer my personal relationships are not used sell crap to
me, I’d prefer not to release those awkward photos from when I had a pony tail.
Broadly, some
stuff I’d like to keep private, some stuff I’d like to share, and some stuff,
I’m happy to share, but only for the right price or with the right organisation;
I want to assign my personal information to the correct publicness categories,
and at a better price point.
And by “better”, I mean, “not selling off the foundations of functional
democracy for all future times to unaccountable interests for a few
dollars a year right now.” which seems a little steep for kitten pictures.
Nooptoutgamified citizenship: China builds the mother of all online reputation systems
China is proposing to assess its citizens’ behavior over a totality of
commercial and social activities, creating an uberscoring system.
When completed, the model could encompass everything from a person’s
chatroom comments to their performance at work, while the score could be
used to determine eligibility for jobs, mortgages, and social services.“They’ve been working on the credit system for the financial industry for a
while now,” says Rogier Creemers, a China expert at Oxford University.
“But, in recent years, the idea started growing that if you’re going to
assess people’s financial status, you should equally be able to do that with
other modes of trustworthiness.”The document talks about the “construction of credibility”—the ability to
give and take away credits—across more than 30 areas of life, from energy
saving to advertising.
Why we live in a dystopia even Orwell coudn’t have envisioned
See original: Privacy (notes on how to have it)
Running a secure server
Fri, 04/12/2015  1:13am  by dan mackinlayOr at least a somewhat more secure server.
So many parts to this, and I care so little about any of them.
SSL
Nonetheless, a baseline important detail to use modern web services is SSL, a notoriously tedious process.
This recently got easier and cheaper with
Let’s Encrypt
and their client software
letsencruptnosudo
or simp_le
Proxy/privacy/anonymisation servers
Run your own search server?
 mysearch  Local search engine portal designed to anonymate your search requests and have a better display of search results
A public instance is available at https://search.jesuislibre.net/  searx is the same, I think
Running your own VPN/proxy/anonymizing/p2p etc servers can
be less convenient for the panopticon for other stuff.
Note, however, that virtual machines on someone else’s cloud can never be
especially secure from determined nasty persons or state actors.
See original: Running a secure server
Running a secure server
Fri, 04/12/2015  1:13am  by dan mackinlayOr at least a somewhat more secure server.
So many parts to this, and I care so little about any of them.
SSL
Nonetheless, a baseline important detail to use modern web services is SSL, a notoriously tedious process.
This recently got easier and cheaper with
Let’s Encrypt
and their client software
letsencruptnosudo
or simp_le
Proxy/privacy/anonymisation servers
Run your own search server?
 mysearch  Local search engine portal designed to anonymate your search requests and have a better display of search results
A public instance is available at https://search.jesuislibre.net/  searx is the same, I think
Running your own VPN/proxy/anonymizing/p2p etc servers can
be less convenient for the panopticon for other stuff.
Note, however, that virtual machines on someone else’s cloud can never be
especially secure from determined nasty persons or state actors.
See original: Running a secure server
Prendre les espaces de temps pour maîtriser les impacts diffus générés par les grandes infrastructures de transport terrestre (ITT) sur la biodiversité
Fri, 04/12/2015  1:00am  by Eric DucheminLes grandes Infrastructures de Transport Terrestre (ITT) génèrent de multiples impacts sur la biodiversité, depuis les premières transformations du paysage en amont des travaux de construction jusqu’aux effets de la gestion des dépendances vertes en phase d’exploitation. Les travaux scientifiques en road ecology ont permis de spatialiser au gré des recherches la plupart des impacts des ITT sur les milieux naturels et la biodiversité. Dans le contexte territorial français, la loi de 1976 sur les études d’impact, puis les lois Grenelle dans les années 2000 ont établi un cadre réglementaire de plus en plus exigeant. Ce cadre d’études a favorisé l’émergence de nouvelles pratiques d’ingénierie visant à la transparence écologique des ITT suivant la doctrine de l’EvitementRéductionCompensation (ERC). Désormais, aménageurs et chercheurs s’interrogent aussi sur la continuité des impacts tout au long des phases de vie des ITT, ainsi que sur les impacts cumulés (mêlant impacts directs, indir...
R (the language)
Thu, 03/12/2015  5:11am  by dan mackinlayR is the current hotness in statistics. I may as well use it, as 2/3 of all
statistical algorithms I’ve run into of late are implemented in it. Of those
that remain, most of the rest are written for MATLAB, which is, IMO, some kind
of weird con job pulled on the maths community by disgruntled scientific
computation graduates who want to double bill you for the use of your own
floating point unit. C, Python and Java seem to vie for 3rd spot,
or possibly one of the sommercial alternatives such as S, SPSS or stata.
There are some disconvertingly enthusiastic persons advocating Julia,
and a few super old school commandline thingies.
Pros and cons
Good
 combines unparalleled breadth and community, at least as pertains to
statisticians, data miners, machine learners and other such
assorted folk as I am pleased to call my colleagues. To get some sense of
this thriving scene, check out Rbloggers. This community alone is
enough to sell R, whatever you think of the language
(cf “Your community
is your best asset“)
And believe me, I have reservations about everything else.  amazing, statisticallyuseful plotting (cf, e.g., the awful battle to
get error bars in mayavi)  online webapp visualization: shiny
Bad
 Seems, from my personal aesthetic, to have been written by a team who
prioritise delivering statistical functionality right now over making an
elegant, fast or consistent language to access that functionality.
(“Elegant”, “fast”, “consistent”; you can choose… uh…
Oh look, it’s lunch break! So what are you doing this weekend?)
I’d rather access those same beautiful libraries through a language which has
had as many computer scientists winnowing its ugly bits as Python or Ruby
has had.
Or indeed Go, Julia, even javascript has managed to drag itself out of hell
theses days.
And, for that matter, I’d like as many amazing thirdparty
libraries for nonstatistical things as these other languages promise.  Poetically, R has random scope amongst other
parser and syntax weirdness.  Callbyvalue semantics (in a “bigdata” processing language?)
 …ameliorated not even by array views,
 …exacerbated by bloaty design
 Object model tacked on after the fact… in fact, several object models,
which is fine? I guess? maybe, but…  …if the object model stuff is multistandard compatibility disaster,
I’d like the tradeoff to be speed, or functional design features, or some
other such modern convenience. Nah.  One of the worst names to google for ever (cf Processing, Pure)
Tips
Easy project reload
Make a folder called MyCode with a DESCRIPTION file.
Make a subfolder called R.
Put R code in .R files in there.
Edit, load_all(“MyCode”), use the functions.
Functional prog hacks
split/apply
useful functions: semi_join etc
plyr and dplyr are the essential package.
subsetting hell
To subset a list based object:
x[1]
to subset and optionally downcast the same:
x[[1]]
to subset a matrixbased object:
x[1, , drop=FALSE]
to subset and optionally downcast the same:
x[1]
plotting
ggvis is the latest iteration of the ggplot family, AFAICT.
Pro tip:
It’s worth having an install of R around just for the grammar of graphics packages.
How to pass sparse matrices between R and Python
https://gist.github.com/howthebodyworks/9e89e65bfc58fded46ae
This FSbacked method was a couple of orders of magnitude faster than rpy2 last time I tried to pass more than a few MB of data.
Upgrading R breaks the installed packages
This is the fix:
update.packages(checkBuilt=TRUE, ask=FALSE)
Bioconductor’s horrifyingly pwnable install
In fact, the default package management might not be much better, but the
secondary R package repository makes it terrifyingly clear:
What, you’d like to install some biostatistics software on
your campus supercomputing cluster? Easy! Simply download and run this
unverifiable obligatedly unencrypted unsigned script from a webserver of unknown provenance!
source("http://bioconductor.org/biocLite.R") biocLite("RBGL")
It is probably usually not often script kiddies spoofing you so as to to trojan
your campus computing cluster to steal CPU cycles. After all,
who would do that?
On an unrelated note, I am looking for investors in a distributed bitcoin
mining operation. Contact me privately.
There are step debuggers and other such modern conveniences

inspecting frames post hoc: recover
In fact, protip, you can invoke it in 3rd party code gracefully:options(error = utils::recover)

Interactive debugger: browser

Graphical interactive optionallywebbased debugger available in RStudio and if it had any more buzzwords in it would socially tag your instagram and upload in to the NSA’s Internet Of Things to be 3D printed.

easy commandline invocation: Rio — Loads CSV from stdin into R as a data.frame, executes given commands, and gets the output as CSV or PNG on stdout
R for Pythonistas
Many things about R are surprising to me, coming as I do most recently from
Python. I’m documenting my perpetual surprise here, in order that it may save
someone else the inconvenience of going to all that trouble to be personally surprised.
Opaque imports
Importing an R package, unlike importing a python module, brings in random
cruft that may have little to do with the names of the thing you just imported.
That is, IMO, poor planning, although history indicates that most language
designers don’t agree with me on that:
> npreg Error: object 'npreg' not found > library("np") Nonparametric Kernel Methods for Mixed Datatypes (version 0.404) > npreg function (bws, ...) #etc
Further, Data structures in R can do, and are intended to, provide first class scopes
for looking up of names. You are, as apt of your explorations into data to
bring the names of columns in a data set into scope just as much as the names
of functions in a library. This is kind of useful, although the scoping
proceedings do make my eyes water when this intersects with function definition.
Formulas are cool and ugly, like Adult Swim, and intimately bound up in the
prior point.
assignment to function calls
I need to learn the R terminology to describe this.
R fosters a style of programming where attributes and metadata of data objects
are set by using accessor functions, e.g. in matrix column naming:
> m=matrix(0, nrow=2,ncol=2) > m [,1] [,2] [1,] 0 0 [2,] 0 0 > colnames(m) NULL > colnames(m)=c('a','b') > colnames(m) [1] "a" "b" > m a b [1,] 0 0 [2,] 0 0
If you want to know by observing its effects whether an apparent function
returns some massaged product of is argument, or whether it decorates the
argument, well, check the manual. As a rule, the accessor functions operate on
one object and return null, although so can, e.g., plotting functions.
No scalar types…
A float is a float vector of size 1:
> 5 [1] 5
…yet verbose vector literal syntax
You makes vectors by using a call to a function called c. Witness:
> c('a', 'b', 'c', 'd') [1] "a" "b" "c" "d"
If you type a literal vector in though, it will throw an error:
> 'a', 'b', 'c', 'd' Error: unexpected ',' in "'a',"
I’m sure there are Reasons for this;
it’s just that they are reasons that I don’t care about.
In short,
A powerful, effective, diverse, wellsupported nightmare.
OTOH, the as far as statistical languages go, this is wonderful;
The others are less supported, less diverse,
and R is now the de facto standard,
so I count my blessings.
To read
 Drew Conway’s strata bootcamp
 Jeremy Howard of Kaggle gives a virtuous and improving presentation
See original: R (the language)
Coarse graining
Wed, 02/12/2015  7:36am  by dan mackinlayAFAICT, this is the question ‘how much worse do your predictions get as you discard
information in some orderly fashion?’, as framed by physicists.
Do “renormalisation groups”, whatever they are, fit in here?
How about Scholtes and his timerespecting networks?
Where the coarse gaining is itself a stochastic proces,
is this just a
hierarchical model,
in the statistical sense?
To consider: the algorithmic statistics angle,
the pseudorandomness angle,
the probabilistic angel as exemplified by the suggestive utility of
sigmaalgebras and filtrations here.
To read, classics
 BarYam, Y. (2003). Dynamics Of Complex Systems. Westview Press.
 Castiglione, P., & Falcioni, M. (2008). Chaos and Coarse Graining in Statistical Mechanics. Cambridge, UK ; New York: Cambridge University Press.
 NESCI’s multiscale methods page
To read, actually want to

Petri, G., Expert, P., Turkheimer, F., CarhartHarris, R., Nutt, D., Hellyer, P. J., & Vaccarino, F. (2014). Homological scaffolds of brain functional networks. Journal of The Royal Society Interface, 11(101), 20140873. DOI. Online.
Talks about a funsounding “persistent homology” idea, which sounds a little
like some kind of topological measure theory to my analyticsbiassed perspective:Persistent homology is a recent technique in computational topology developed for shape recognition and the analysis of high dimensional datasets [36,37].
It has been used in very diverse fields, ranging from biology [38,39] and sensor network coverage [40] to cosmology [41].
Similar approaches to brain data [42,43], collaboration data [44] and network structure [45] also exist.
The central idea is the construction of a sequence of successive approximations of the original dataset seen as a topological space X.
This sequence of topological spaces \(X_0, X_1, \dots{}, X_N = X\) is such that \(X_i \subseteq X_j\) whenever \(i < j\) and is called the filtration.
Choosing how to construct a filtration from the data is equivalent to choosing the type of goggles one wears to analyse the data.
See original: Coarse graining