Generative art

If you want the lowbrow version of this header, try “creative code”.

Either way, it means, more or less, “using algorithms to make pretty things.”
If you’ve seen a CGI film in the last 20 years, you’ve seen this.
Flocking, L-systems,
agents, evolutionary systems,
pattern formation
and so on.
My interest here reflects my High Art, pontifical sensibility.
But video games are totes sick too, if that’s your bag.
Also 3d printing, augmented reality blah blah.
But you can google all that stuff without my help.
Here is stuff I frequently refer to.

Missing from here: prehistory of such art, early software art and pre-computer algorithmic art.
Later, I will raid Neil Jenkins‘ excellent
garden of forking paths for some pointers.

Examples of praxis

Praxis yourself why dont you?

I praxis myself

Here’s how you might do that with neural networks

Alex Graves on RNN predictive synthesis: https://www.youtube.com/watch?v=-yX1SYeDHbg

Matt Vitelli on music generation https://www.youtube.com/watch?v=0VTI1BBLydE https://github.com/MattVitelli/GRUV

Adversarial generation is a cool hack if you hate boring stuff like labelling data sets https://github.com/goodfeli/adversarial
chair generation http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Dosovitskiy_Learning_to_Generate_2015_CVPR_paper.pdf

General reading

See original: The Living Thing / Notebooks Generative art

Boosting, bagging, voting

Ensemble methods.
Fast to train, fast to use. Get you results. May not get you answers.
So, like neural networks but you don’t need a server farm.

Jeremy kun: Why Boosting Doesn’t Overfit:

Boosting, which we covered in gruesome detail previously, has a natural
measure of complexity represented by the number of rounds you run the
algorithm for.
Each round adds one additional “weak learner” weighted vote.
So running for a thousand rounds gives a vote of a thousand weak learners.
Despite this, boosting doesn’t overfit on many datasets.
In fact, and this is a shocking fact, researchers observed that Boosting
would hit zero training error, they kept running it for more rounds, and the
generalization error kept going down!
It seemed like the complexity could grow arbitrarily without penalty.
[…] this phenomenon is a fact about voting schemes,
not boosting in particular.

Randoms trees, forests, jungles

See original: The Living Thing / Notebooks Boosting, bagging, voting

Ecological fallacies

“With great spreadsheets comes great responsibility.”

The danger of folk statistics.
The problems of excluded variables.

Avoidance of Ecological fallacy in mean-field approximation.
Simpson’s paradox.

Spurious correlation induced by sampling bias

See also graphical models,
hierarchical models.

See original: The Living Thing / Notebooks Ecological fallacies

Bandit problems, reinforcement learning, and stochastic control

Bandit problems, Markov decision problems, a smattering of dynamic programming,
game theory, and online learning the solutions to such problems.

Clickbait bandit problems

On the science of treating consumers of modern news media like what they are,
nearly passive objects of surveillance and control.
Because trying to rely on peoples’ rationality and agency to get things done has
a poor track record in recent history.

Practically, the state of the art here is AFAICT a class of bandit problems.

New tool by microsoft: Multi-World Testing (MWT)

… is a toolbox of machine learning technology for principled and efficient
experimentation, plausibly applicable to most Microsoft services that
interact with customers. In many scenarios, this technology is exponentially
more efficient than the traditional A/B testing. The underlying research
area, mature and yet very active, is known under many names: “multi-armed
bandits”, “contextual bandits”, “associative reinforcement learning”, and
“counterfactual evaluation”, among others.

To take an example, suppose one wants to optimize clicks on suggested news
stories. To discover what works, one needs to explore over the possible news
stories. Further, if the suggested news story can be chosen depending on the
visitor’s profile, then one needs to explore over the possible “policies”
that map profiles to news stories (and there are exponentially more
“policies” than news stories!). Traditional ML fails at this because it does
not explore. Whereas MWT allows you to explore continuously, and optimize
your decisions using this exploration data.

(partial MWT source code)

The “bandit problems” phrase comes, by the way, from an extension of the “one
armed bandit”, the poker machine, into a mathematical model for exploring the
world through pulling on the arms of a poker machine.
There is a pleasing symmetry in that modern poker machines, and indeed the
internet in gerneal, model the customer as a machine upon whose arm they pull
to get a reward, and that this reward is addicting the customer to pulling on
the arms of their poker machine.

You should read this before you next blame someone
(especially a millenial, especially if you are not a millenial)
for having no attention span, then take a deep look into your soul;
Michael Schulson, if the internet is addictive, why don’t we regulate it?

As a consultant to Silicon Valley startups, Eyal helps his clients mimic what
he calls the ‘narcotic-like properties’ of sites such as Facebook and
Pinterest.
His goal, Eyal told Business Insider, is to get users ‘continuing through the
same basic cycle.
Forever and ever.’

[…]
There are differences between a slot machine and a website, of course.
With the former, the longer you’re engaged by variable rewards, the more
money you lose.
For a tech company in the attention economy, the longer you’re engaged by
variable rewards, the more time you spend online, and the more money they
make through ad revenue.

Yet we keep blaming people.
As Schüll puts it:
‘It just seems very duplicitous to design with the goal of capturing
attention, and then to put the whole burden onto the individual.’

Stupid rats, running the mazes we set them instead of dotcom startups.

Also, there’s interesting mathematics!
social graphs!
self-exciting point processes! And all the bandit problem literature!

Markov decision problems

Bellman and Howard’s classic discrete time control stochastic problem
* http://www.castlelab.princeton.edu/ORF569papers/Powell_ADP_2ndEdition_Chapter%203.pdf

POMDP

Too many CPU cycles?

“A POMDP is a partially observable Markov decision process. It is a model, originating in the operations research (OR) literature, for describing planning tasks in which the decision maker does not have complete information as to its current state. The POMDP model provides a convenient way of reasoning about tradeoffs between actions to gain reward and actions to gain information.”

Reinforcement learning

To read

See original: The Living Thing / Notebooks Bandit problems, reinforcement learning, and stochastic control

Random fields

An area so broad it’s not so much a research field as a way of life.

See also point processess,
time series,
graphical models

See original: The Living Thing / Notebooks Random fields

Random fields

An area so broad it’s not so much a research field as a way of life.

See also point processess,
time series,
graphical models,
spatial statistics …

To investigate: Random tree fields, Markov random fields, conditional random fields…

See original: The Living Thing / Notebooks Random fields

Branching processes

A class of stochastic models,
certain types of generalisations of the Galton Watson process,
that I am mildly obsessed with.

There seem to be various subspecies.

TODO: notes on process defined on a multidimensional index set, i.e.
space-time processes and branching random fields. (“cluster processes”)
(Taster over at spatial statistics
or random fields)

Discrete index, discrete state: The Galton-Watson process and friends

There are many standard expositions of this; I won’t write another here.

Two good ones:

Generalised Galton Watson process

This section got long enough to break out separately.
See my notes on some generalisations of Galton-Watson process.

Continuous index, discrete state: the Hawkes Process

If you have a integer-valued state space, but a continuous time
index, then this is a Hawkes Point Process.
The cluster point process

See my masters thesis.

Continous index, continuous state: The CSBP type of Lévy process

Aldous does a no-nonsense expo on these.
Super trendy at the moment:
t turns out that growing trees is connected in a
deep but purportedly simple way to “glueing together” excursions of random
processes, oh,
and a bunch of trippy fractals and random trees and stuff.
Too sleepy to explain THAT right now;
How about I pass out with the seminar still in my head then forget it instantly?

Lee and Hopcraft (LeHJ08) also found an analogous result for discrete state
branching processes.

Lamperti representation

Need to see if I can get my head around the forms of Lamperti representations.
Basically, the compensator is an a.s. positive process which gives us a time change, and the Lamperti representation gives us incredible universality for that.
The Lamperti representation goes for very general Lévy processes;
I can make do with much simpler ones for count data.

This is an example of a change-of-time result, also popular in point processes.

Discrete index, continuous state

Umm. Is this well-defined? I suppose so.
Can’t find any literature references though.
It surely has a fancy name.
“Marked Galton-Watson Process”?
Some kind of compound Poisson, I imagine.

Superprocesses

Measure-valued state or something?
Can’t recall, must investigate later.

  • Dynkin, E. B.(1991). Branching Particle Systems and Superprocesses. The Annals of Probability, 19(3), 1157–1194. DOI.
  • Dynkin, E. B.(2004). Superdiffusions and positive solutions of nonlinear partial differential equations. Providence, R.I: American Mathematical Society.
  • Etheridge, A. (2000). An introduction to superprocesses. Providence, RI: American Mathematical Society.

To read

Aldo91: Aldous, D. (1991). The Annals of Probability The Continuum Random Tree. I, 19(1), 1–28. DOI.

Aldo93: Aldous, D. (1993). The Annals of Probability The Continuum Random Tree III, 21(1), 248–289. DOI.

Appl04: Applebaum, D. (2004). Notices of the AMS Lévy processes-from probability to finance and quantum groups, 51(11), 1336–1347.

AtKe77: Athreya, K. B., & Keiding, N. (1977). Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) Estimation Theory for Continuous-Time Branching Processes, 39(2), 101–123.

AtVi97: Athreya, K. B., & Vidyashankar, A. N.(1997). In K. B. Athreya & P. Jagers (Eds.), Classical and Modern Branching Processes Large Deviation Rates for Supercritical and Critical Branching Processes (pp. 1–18). Springer New York

BaDM12: Bacry, E., Dayri, K., & Muzy, J. F.(2012). The European Physical Journal B Non-parametric kernel estimation for symmetric Hawkes processes. Application to high frequency financial data, 85(5), 1–12. DOI.

BDHM13a: Bacry, E., Delattre, S., Hoffmann, M., & Muzy, J. F.(2013a). Quantitative Finance Modelling microstructure noise with mutually exciting point processes, 13(1), 65–77. DOI.

BDHM13b: Bacry, E., Delattre, S., Hoffmann, M., & Muzy, J. F.(2013b). Stochastic Processes and Their Applications Some limit theorems for Hawkes processes and application to financial statistics, 123(7), 2475–2499. DOI.

BaJM14: Bacry, E., Jaisson, T., & Muzy, J.-F. (2014). arXiv:1412.7096 [q-Fin, Stat] Estimation of slowly decreasing Hawkes kernels: Application to high frequency order book modelling

BaMu14a: Bacry, E., & Muzy, J.-F. (2014a). Quantitative Finance Hawkes model for price and trades high-frequency dynamics, 14(7), 1147–1166. DOI.

BaMu14b: Bacry, E., & Muzy, J.-F. (2014b). arXiv:1401.0903 [physics, Q-Fin, Stat] Second order statistics characterization of Hawkes processes and non-parametric estimation

Badd07: Baddeley, A. (2007). In W. Weil (Ed.), Stochastic Geometry Spatial Point Processes and their Applications (pp. 1–75). Springer Berlin Heidelberg

BhAd81: Bhat, B. R., & Adke, S. R.(1981). Advances in Applied Probability Maximum Likelihood Estimation for Branching Processes with Immigration, 13(3), 498–509. DOI.

BiSø95: Bibby, B. M., & Sørensen, M. (1995). Bernoulli Martingale Estimation Functions for Discretely Observed Diffusion Processes, 1(1/2), 17–39. DOI.

Bött13: Böttcher, B. (2013). Stochastics and Dynamics Feller evolution systems: Generators and approximation, 14(03), 1350025. DOI.

BrHe75: Brown, B. M., & Hewitt, J. I.(1975). Journal of Applied Probability Inference for the Diffusion Branching Process, 12(3), 588–594. DOI.

CaCh06: Caballero, M. E., & Chaumont, L. (2006). Journal of Applied Probability Conditioned Stable Lévy Processes and the Lamperti Representation, 43(4), 967–983.

CaGB13: Caballero, M. E., Garmendia, J. L. P., & Bravo, G. U.(2013). The Annals of Probability A Lamperti-type representation of continuous-state branching processes with immigration, 41(3A), 1585–1627. DOI.

CaLB09: Caballero, M.-E., Lambert, A., & Bravo, G. U.(2009). Probability Surveys Proof(s) of the Lamperti representation of Continuous-State Branching Processes, 6, 62–89. DOI.

Chis64: Chistyakov, V. (1964). Theory of Probability & Its Applications A Theorem on Sums of Independent Positive Random Variables and Its Applications to Branching Random Processes, 9(4), 640–648. DOI.

Çinl75: Çinlar, E. (1975). Management Science Exceptional Paper—Markov Renewal Theory: A Survey, 21(7), 727–752. DOI.

Cohn97: Cohn, H. (1997). In K. B. Athreya & P. Jagers (Eds.), Classical and Modern Branching Processes Stochastic Monotonicity and Branching Processes (pp. 51–56). Springer New York

CrSS10: Crane, R., Schweitzer, F., & Sornette, D. (2010). Physical Review E Power law signature of media exposure in human response waiting time distributions, 81(5), 056101. DOI.

CrDL99: Crisan, D., Del Moral, P., & Lyons, T. (1999). Markov Processes and Related Fields Discrete filtering using branching and interacting particle systems, 5(3), 293–318.

CuLe13: Curien, N., & Le Gall, J.-F. (2013). Journal of Theoretical Probability The Brownian Plane, 27(4), 1249–1291. DOI.

DaVe03: Daley, D. J., & Vere-Jones, D. (2003) An introduction to the theory of point processes (2nd ed., Vol. 1. Elementary theory and methods). New York: Springer

DaVe08: Daley, D. J., & Vere-Jones, D. (2008) An introduction to the theory of point processes (2nd ed., Vol. 2. General theory and structure). New York: Springer

DaZh11: Dassios, A., & Zhao, H. (2011). Advances in Applied Probability A dynamic contagion process, 43(3), 814–846. DOI.

DeSp97: Dekking, F. M., & Speer, E. R.(1997). In K. B. Athreya & P. Jagers (Eds.), Classical and Modern Branching Processes On the Shape of the Wavefront of Branching Random Walk (pp. 73–88). Springer New York

DeMi00: Del Moral, P., & Miclo, L. (2000). In Séminaire de Probabilités XXXIV Branching and interacting particle systems approximations of Feynman-Kac formulae with applications to non-linear filtering (pp. 1–145). Springer

DeSo05: Deschâtres, F., & Sornette, D. (2005). Physical Review E Dynamics of book sales: Endogenous versus exogenous shocks in complex networks, 72(1), 016112. DOI.

DoKy06: Doney, R. A., & Kyprianou, A. E.(2006). The Annals of Applied Probability Overshoots and undershoots of Lévy processes, 16(1), 91–106. DOI.

DuPo15: Duembgen, M., & Podolskij, M. (2015). Stochastic Processes and Their Applications High-frequency asymptotics for path-dependent functionals of Itô semimartingales, 125(4), 1195–1217. DOI.

Dynk91: Dynkin, E. B.(1991). The Annals of Probability Branching Particle Systems and Superprocesses, 19(3), 1157–1194. DOI.

Dynk04: Dynkin, E. B.(2004) Superdiffusions and positive solutions of nonlinear partial differential equations. Providence, R.I: American Mathematical Society

EmLL11: Embrechts, P., Liniger, T., & Lin, L. (2011). Journal of Applied Probability Multivariate Hawkes processes: an application to financial data, 48A, 367–378. DOI.

Ethe00: Etheridge, A. (2000) An introduction to superprocesses. Providence, RI: American Mathematical Society

Evan08: Evans, S. N.(2008) Probability and real trees (Vol. 1920). Berlin: Springer

FaTe12: Falkner, N., & Teschl, G. (2012). Expositiones Mathematicae On the substitution rule for Lebesgue–Stieltjes integrals, 30(4), 412–418. DOI.

Feig76: Feigin, P. D.(1976). Advances in Applied Probability Maximum Likelihood Estimation for Continuous-Time Stochastic Processes, 8(4), 712–736. DOI.

FBMS14: Filimonov, V., Bicchetti, D., Maystre, N., & Sornette, D. (2014). Journal of International Money and Finance Quantification of the high level of endogeneity and of structural regime shifts in commodity markets, 42, 174–192. DOI.

Flee14: Fleet, L. (2014). Nature Physics Networks: Improve your virality, 10(6), 415–415. DOI.

Gutt91: Guttorp, P. (1991) Statistical inference for branching processes. New York: Wiley

HaJV05: Haccou, P., Jagers, P., & Vatutin, V. A.(2005) Branching Processes: Variation, Growth, and Extinction of Populations. Cambridge: Cambridge University Press

HaBo13: Halpin, P. F., & Boeck, P. D.(2013). Psychometrika Modelling dyadic Interaction with Hawkes Processes, 78(4), 793–814. DOI.

HaBB13: Hardiman, S. J., Bercot, N., & Bouchaud, J.-P. (2013). The European Physical Journal B Critical reflexivity in financial markets: a Hawkes process analysis, 86(10), 1–9. DOI.

HaBo14: Hardiman, S. J., & Bouchaud, J.-P. (2014). Physical Review E Branching-ratio approximation for the self-exciting Hawkes process, 90(6), 062807. DOI.

Hawk71: Hawkes, A. G.(1971). Biometrika Spectra of some self-exciting and mutually exciting point processes, 58(1), 83–90. DOI.

HaOa74: Hawkes, A. G., & Oakes, D. (1974). Journal of Applied Probability A cluster process representation of a self-exciting process, 11(3), 493. DOI.

HeSe10: Heyde, C. C., & Seneta, E. (2010). In R. Maller, I. Basawa, P. Hall, & E. Seneta (Eds.), Selected Works of C.C. Heyde Estimation Theory for Growth and Immigration Rates in a Multiplicative Process (pp. 214–235). Springer New York

IrMo11: Iribarren, J. L., & Moro, E. (2011). Physical Review E Branching dynamics of viral information spreading, 84(4), 046116. DOI.

Jaco97: Jacod, J. (1997). In J. Azéma, M. Yor, & M. Emery (Eds.), Séminaire de Probabilités XXXI On continuous conditional Gaussian martingales and stable convergence in law (pp. 232–246). Springer Berlin Heidelberg

JaPV10: Jacod, J., Podolskij, M., & Vetter, M. (2010). The Annals of Statistics Limit theorems for moving averages of discretized processes plus noise, 38(3), 1478–1545. DOI.

Jage69: Jagers, P. (1969). Arkiv För Matematik Renewal theory and the almost sure convergence of branching processes, 7(6), 495–504. DOI.

Jage97: Jagers, P. (1997). In K. B. Athreya & P. Jagers (Eds.), Classical and Modern Branching Processes Towards Dependence in General Branching Processes (pp. 127–139). Springer New York

Jáno07: János Engländer. (2007). Probability Surveys Branching diffusions, superdiffusions and random media, 4, 303–364. DOI.

00: János Engländer - 2007 - Branching diffusions, superdiffusions and random m.pdf. (n.d.)

Kest73: Kesten, H. (1973). Acta Mathematica Random difference equations and Renewal theory for products of random matrices, 131(1), 207–248. DOI.

KrPa14: Kraus, A., & Panaretos, V. M.(2014). Biometrika Frequentist estimation of an epidemic’s spreading potential when observations are scarce, 101(1), 141–154. DOI.

KvPa11: Kvitkovičová, A., & Panaretos, V. M.(2011). Advances in Applied Probability Asymptotic inference for partially observed branching processes, 43(4), 1166–1190. DOI.

LSTB15: Lakshmanan, K. C., Sadtler, P. T., Tyler-Kabara, E. C., Batista, A. P., & Yu, B. M.(2015). Neural Computation Extracting Low-Dimensional Latent Structure from Time Series in the Presence of Delays, 27(9), 1825–1856. DOI.

Lamp67a: Lamperti, J. (1967a). Bull. Amer. Math. Soc Continuous-state branching processes, 73(3), 382–386.

Lamp67b: Lamperti, J. (1967b). Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete The Limit of a Sequence of Branching Processes, 7(4), 271–288. DOI.

LaDG09: Laredo, C., David, O., & Garnier, A. (2009). arXiv:0902.4520 [stat] Inference for Partially Observed Multitype Branching Processes and Ecological Applications

LaTP15: Laub, P. J., Taimre, T., & Pollett, P. K.(2015). arXiv:1507.02822 [math, Q-Fin, Stat] Hawkes Processes

LeHJ08: Lee, W. H., Hopcraft, K. I., & Jakeman, E. (2008). Physical Review E Continuous and discrete stable processes, 77(1), 011109. DOI.

Lega05: Le Gall, J.-F. (2005). Probability Surveys Random trees and applications, 2, 245–311. DOI.

Lega13: Le Gall, J.-F. (2013). The Annals of Probability Uniqueness and universality of the Brownian map, 41(4), 2880–2960. DOI.

LeMi12: Le Gall, J.-F., & Miermont, G. (2012). Probability and Statistical Physics in Two and More Dimensions Scaling limits of random trees and planar maps, 15, 155–211.

LeHe13: Levina, A., & Herrmann, J. M.(2013). Stochastics and Dynamics The Abelian distribution, 14(03), 1450001. DOI.

LeMo11: Lewis, E., & Mohler, G. (2011). Preprint A nonparametric EM algorithm for multiscale Hawkes processes

Lini09: Liniger, T. J.(2009) Multivariate Hawkes processes. Diss., Eidgenössische Technische Hochschule ETH Zürich, Nr. 18403, 2009

LiMy07: Li, Y., & Mykland, P. A.(2007). Bernoulli Are volatility estimators robust with respect to modeling assumptions?, 13(3), 601–622. DOI.

Li12: Li, Z. (2012). arXiv:1202.3223 [math] Continuous-state branching processes

Li14: Li, Z. (2014). The Annals of Probability Path-valued branching processes and nonlocal branching superprocesses, 42(1), 41–79. DOI.

Li00: Li, Z.-H. (2000). Journal of the Australian Mathematical Society (Series A) Asymptotic Behaviour of Continuous Time and State Branching Processes, 68(01), 68–84. DOI.

Lyon90: Lyons, R. (1990). The Annals of Probability Random Walks and Percolation on Trees, 18(3), 931–958. DOI.

Lyon11: Lyons, R. (2011) Probability on trees and networks

MaLe08: Marsan, D., & Lengliné, O. (2008). Science Extending earthquakes’ reach through cascading, 319(5866), 1076–1079. DOI.

Mein09: Meiners, M. (2009). Stochastic Processes and Their Applications Weighted branching and a pathwise renewal equation, 119(8), 2579–2597. DOI.

MSBS11: Mohler, G. O., Short, M. B., Brantingham, P. J., Schoenberg, F. P., & Tita, G. E.(2011). Journal of the American Statistical Association Self-exciting point process modeling of crime, 106(493), 100–108. DOI.

MoIm10: Motoike, I. N., & Imamura, H. T.(2010). Physical Review E Branching pattern formation that reflects the history of signal propagation, 82(4), 046205. DOI.

NaWa84: Nanthi, K., & Wasan, M. T.(1984). Stochastic Processes and Their Applications Branching processes, 18(2), 189. DOI.

Neut78: Neuts, M. F.(1978). Naval Research Logistics Quarterly Renewal processes of phase type, 25(3), 445–454. DOI.

Oake75: Oakes, D. (1975). Journal of Applied Probability The Markovian self-exciting process, 12(1), 69. DOI.

Ogat78: Ogata, Y. (1978). Annals of the Institute of Statistical Mathematics The asymptotic behaviour of maximum likelihood estimators for stationary point processes, 30(1), 243–261. DOI.

Ogat88: Ogata, Y. (1988). Journal of the American Statistical Association Statistical models for earthquake occurrences and residual analysis for point processes, 83(401), 9–27. DOI.

Ogat99: Ogata, Y. (1999). Pure and Applied Geophysics Seismicity analysis through point-process modeling: a review, 155(2-4), 471–507. DOI.

OgAk82: Ogata, Y., & Akaike, H. (1982). Journal of the Royal Statistical Society, Series B On linear intensity models for mixed doubly stochastic Poisson and self-exciting point processes, 44, 269–274. DOI.

Olof05: Olofsson, P. (2005) Probability, statistics, and stochastic processes. Hoboken, N.J: Hoboken, N.J. : Wiley-Interscience

Over98: Overbeck, L. (1998). Scandinavian Journal of Statistics Estimation for Continuous Branching Processes, 25(1), 111–126. DOI.

Ozak79: Ozaki, T. (1979). Annals of the Institute of Statistical Mathematics Maximum likelihood estimation of Hawkes’ self-exciting point processes, 31(1), 145–155. DOI.

PoVe10: Podolskij, M., & Vetter, M. (2010). Statistica Neerlandica Understanding limit theorems for semimartingales: a short survey: Limit theorems for semimartingales, 64(3), 329–351. DOI.

ReSc10: Reynaud-Bouret, P., & Schbath, S. (2010). The Annals of Statistics Adaptive estimation for Hawkes processes; application to genome analysis, 38(5), 2781–2822. DOI.

RIKK15: Ruan, Z., Iniguez, G., Karsai, M., & Kertesz, J. (2015). arXiv:1506.00251 [physics] Kinetics of Social Contagion

SaHS05: Saichev, A., Helmstetter, A., & Sornette, D. (2005). Pure and Applied Geophysics Power-law Distributions of Offspring and Generation Numbers in Branching Models of Earthquake Triggering, 162(6-7), 1113–1134. DOI.

SaSo10: Saichev, A. I., & Sornette, D. (2010). The European Physical Journal B Generation-by-generation dissection of the response function in long memory epidemic processes, 75(3), 343–355. DOI.

SaMS08: Saichev, A., Malevergne, Y., & Sornette, D. (2008). arXiv:0808.1828 [physics, Q-Fin] Theory of Zipf’s law and of general power law distributions with gibrat’s law of proportional growth

SaSo11a: Saichev, A., & Sornette, D. (2011a). arXiv:1101.5564 [cond-Mat, Physics:physics] Generating functions and stability study of multivariate self-excited epidemic processes

SaSo11b: Saichev, A., & Sornette, D. (2011b). arXiv:1101.1611 [cond-Mat, Physics:physics] Hierarchy of temporal responses of multivariate self-excited epidemic processes

Seva68: Sevast’yanov, B. A.(1968). Mathematical Notes of the Academy of Sciences of the USSR Renewal equations and moments of branching processes, 3(1), 3–10. DOI.

SMSG10: Sood, V., Mathieu, M., Shreim, A., Grassberger, P., & Paczuski, M. (2010). Physical Review Letters Interacting branching process as a simple model of innovation, 105(17), 178701. DOI.

Sorn06: Sornette, D. (2006). In Extreme events in nature and society Endogenous versus exogenous origins of crises (pp. 95–119). Springer

SDGA04: Sornette, D., Deschâtres, F., Gilbert, T., & Ageon, Y. (2004). Physical Review Letters Endogenous versus exogenous shocks in complex networks: An empirical test using book sale rankings, 93(22), 228701. DOI.

SoHe03: Sornette, D., & Helmstetter, A. (2003). Physica A: Statistical Mechanics and Its Applications Endogenous versus exogenous shocks in systems with memory, 318(3–4), 577–591. DOI.

SoMM02: Sornette, D., Malevergne, Y., & Muzy, J. F.(2002). arXiv:cond-mat/0204626 Volatility fingerprints of large shocks: Endogeneous versus exogeneous

SoMM04: Sornette, D., Malevergne, Y., & Muzy, J.-F. (2004). In H. Takayasu (Ed.), The Application of Econophysics Volatility fingerprints of large shocks: endogenous versus exogenous (pp. 91–102). Springer Japan

SoUt09: Sornette, D., & Utkin, S. (2009). Physical Review E Limits of declustering methods for disentangling exogenous from endogenous events in time series with foreshocks, main shocks, and aftershocks, 79(6), 061110. DOI.

VeSc08: Veen, A., & Schoenberg, F. P.(2008). Journal of the American Statistical Association Estimation of Space–Time Branching Process Models in Seismology Using an EM–Type Algorithm, 103(482), 614–624. DOI.

Wata68: Watanabe, S. (1968). Journal of Mathematics of Kyoto University A limit theorem of branching processes and continuous state branching processes, 8(1), 141–167.

Wein65: Weiner, H. J.(1965). The Annals of Mathematical Statistics An Integral Equation in Age Dependent Branching Processes, 36(5), 1569–1573. DOI.

YNRS08: Yaari, G., Nowak, A., Rakocy, K., & Solomon, S. (2008). The European Physical Journal B Microscopic study reveals the singular origins of growth, 62(4), 505–513. DOI.

ZhSi13: Zhao, Z., & Singer, A. (2013). Journal of the Optical Society of America A Fourier–Bessel rotational invariant eigenimages, 30(5), 871. DOI.

See original: The Living Thing / Notebooks Branching processes

Javascript visualisations

Javascript statistical graphing

3D visualisation

Yes, 3D in the browser is performance and convenient.
More so, IMO, than processing.
Twoc ommon options use OpenGL ES, the mobile- and browser- friendly option.

  • Scenejs seems to specialise in loading up geometries and shapes
  • three.js has some impressively performant demos

For desktop apps and a larger OpenGl subset there is a desktop option,
Plask which seems to be some kind of particle-system-friendly, OSX app.

See original: The Living Thing / Notebooks Javascript visualisations

Javascript visualisations

Javascript statistical graphing

  • the mothership, d3.js.

  • animation using velocity.js.

  • plot.ly is statistics oriented charting.

  • flot is also statistics oriented charting.

  • waveform graphs audio files for you

  • vega …

    is a declarative format for creating, saving, and sharing visualization designs. With Vega, visualizations are described in JSON, and generate interactive views using either HTML5 Canvas or SVG.

    vega-lite claimts to be a ggplot-like layer atop it.

3D visualisation

Yes, 3D in the browser is performant and convenient.
More so, IMO, than Processing.

Two common options use OpenGL ES, the mobile- and browser- friendly option.

  • Scenejs seems to specialise in loading up geometries and shapes and physics for realistic scene modelling

  • three.js does the same things, but does more abstract stuff with them

  • philoGL

    is a WebGL framework for data visualization, creative coding and game development. It includes modules to manage scenes, cameras and textures and modules to work with effects, web workers and more.

    However, it looks unmaintained.

Everything supports lense flare.

For desktop apps and a larger OpenGl subset there is a desktop option,
Plask which seems to be some kind of particle-system-friendly, OSX app, with spurty development but spectacular potential.

See original: The Living Thing / Notebooks Javascript visualisations

Privacy (notes on how to have it)

Technoprivacy is difficult and tedious for our monkey minds to get a handle on.
However, it’s not too hard.
The trick is, don’t get hung up on thinking you are some kind of secret agent who needs
to hide from the NSA.
You are no Osama bin Laden.
To be fair, these days, even Osama bin Laden is not Osama bin Laden.
Deal with state surveillance through political means if you are worried about the state stealing your information.
(Or at least, work up gradually to truly paranoid privacy attitudes, and research more widely the tips here.)

Instead, for us normal people, the rule should be:
Start by not giving your information away for free to everyone.
And don’t simply give up because it’s too hard:
That’s just doing what big business wants you to do..

That said, just because I’m talking about what our attitude should be as
informed consumers of the addictive drug of single-serve online socialising,
doesn’t mean I’m blaming Jane/Joe Public for not getting it right.
As long as corporate socal networks are permitted to harness their heady blend
of plausibly-deniable social engineering on the vulnerable, we are all put at
greater risk.

Case in point:
A friend of mine just showed me his facebook profile public link before
friending me;
on open display were pictures of his children, his home, his friends, dying
relatives in hospital with confidential medical information and records in the
background;
With his well-intentioned handphone wielding he has volutarily compromised the
privacy, insurance and loan-worthiness of everyone he knows who has confided in
him;
privacy is a weakest-link kind of concept, and as long as Facebook can rely on
a reasonable fraction of the population voluntarily and unconsciously selling
the rest out, we are all compromised.
I know that everything I do in front of this guy will be obediently tagged and
put on public display for the use of not only facebook but any passing mobster,
data miner or insurance company.
The thing is, it is not sufficient if privacy-violating companies are able to
get away with it if in principle experts could avoid some of the pitfalls;
Social media is a habit-forming drug that transmits contagious ailments and
shouldn’t let companies get away with pretending they don’t know, any more than
we should let hospitals dispense unmedicated addictive drugs with dirty
syringes, or put poker machines in school playgrounds.

And companies, such as phone companies, that sell your information no matter what you do must be punished by the laws we haven’t enacted yet.

Anyway, with blame for the abuse appropriately apportioned to people other than the victims,
let’s get back to what we, the victims, can do by taking the responsibility avaialbe to us,
which is not so very hard,
for all that it should not be required of us.

Right now, if you are a typical internet user, you are walking around with no
pants on online.
Everyone can see your junk.
You don’t need to wear a tinfoil hat to hide your junk,
not if your anatomy is anything typical;
you just need to put some pants on.

This enpantsing will be more tedious than we’d like,
because the world is badly designed,
but let’s start with what’s achievable,
and work towards making it easier next
time, eh?

You do too have something to hide.

You commit three felonies a day

A statistical problem with “nothing to hide”:

How we could do it better now

So, some baby steps towards a healthier privacy regime.
I am going to list some
techniques that have aroused my attention.
Later I will triage them according to how urgent is the priority of the privacy
leak they plug and how onerous to handle; e.g. something like:

  1. first keep my credit card details out of the hands of the hands of the mafia, then
  2. keep gratuitous personal data out of the hands of unscrupulous corporations, next
  3. keep nude selfies and pony tail pics out of the hands of potential employers
  4. keep personal data out of the hands of prying foreign security agencies
  5. keep personal data out of the hands of prying local security agencies

These reflect my personal needs;
if you are actually a person of specific
interest to state security agencies, or a mafia credit card thief, you will
probably have different ones.

General

  • Prism break is a chaotic list of solutions.
    Excellent reference, although it really needs to incorporate some idea of how
    popular their suggested solutions are;
    after all, most of these things are only of any damn use if your friends also
    use ‘em.
  • quick guide to the basics of encryption (or how about one with stick figures)
  • VPNs
  • password managers
  • tcpcrypt is a protocol that
    attempts to encrypt (almost) all of your network traffic.
    Unlike other
    security mechanisms, Tcpcrypt works out of the box: it requires no
    configuration, no changes to applications, and your network connections will
    continue to work even if the remote end does not support Tcpcrypt, in which
    case connections will gracefully fall back to standard clear-text TCP.
    Install Tcpcrypt and you’ll feel no difference in your every day user
    experience, but yet your traffic will be more secure and you’ll have made
    life much harder for hackers.

Search engines and browsing

Social networks

  • don’t use them
  • OK, in fact, not using them is harder than you’d like, because
    • The No network effect means that all your
      friends have forgotten how to manage their life without Facebook all up in
      their shit, and anyway
    • if you log in to one of these damn things even once
      you are surveilled in perpetuity by their ubiquitous browser tracking bullshit.
  • so, given that you are using social networks, minimise the risk
  • and oh god if your friends start sharing pictures of you publicly for any
    reason, block them. We need to set up a new social norm around not selling
    each other downstream, until we can fix this clusterfuck.
  • Logins. Don’t login with facebook and google.
    There might be better alternatives in the future (e.g. persona).
    But for now, just don’t.

Synchronising files

See Synchronising files.

Chat

See chat.

Email

See email.

Money

Sick of your financial data being used to find out things about you that even you didn’t know?
Try to pay cash or bitcoins. (Other alternatives?)

Bitcoins have a thriving,
socially-awkard, tin-foil-hat-totin’ community, but are useful. (For example, they are the cheapest way to get money across the border in many places whihc is reasonably essential if you move as often as I do.)

They have lots of howto guides.
e.g.

Miscellaney

How we could do it better later

OK, anyway, we shouldn’t all have to be diigtal privacy experts to survive in the 21st century; How could we change the rules so that we can focus on our day jobs?

(I give you permission to despair if you can do it amusingly,
I’d prefer amusingly with hope

Slamming PGP and the model of human behaviour it assumes is a cottage industry:

  • vinay gupta:

    GPG and HTTPS (X509) are broken in usability terms because the conceptual
    model of trust embedded in each network does not correspond to how people
    actually experience the world.
    As a result, there is a constant grind between people and these systems,
    mainly showing up as a series of user interface disasters.
    The GPG web of trust results in absurd social constructs like signing parties
    because it does not work and creating social constructs that weird to support
    it is a sign of that:
    stand in a line and show 50 strangers your government ID to prove you exist?
    Really?
    Likewise, anybody who’s tried to buy an X509 certificate (HTTPS cert) knows
    the process is absurd:
    anybody who’s really determined can probably figure out how to fake your
    details if they happen to be doing this before you do it for yourself, and of
    the 1500 or so Certificate Authorities issuing trust credentials at least one
    is weak or compromised by a State, and all your browser will tell you is
    “yes, I trust this credential absolutely.”
    You just don’t get any say in the matter at all.

    […]

    The best explanation of this in more detail is the Ode to the Granovetter
    Diagram which shows how this different trust model maps cleanly to the
    networks of human communication found by Mark Granovetter in his sociological
    research.
    We’re talking about building trust systems which correspond to actual trust
    systems as they are found in the real world, not the broken military
    abstractions of X509 or the flawed cryptoanarchy of GPG.

  • This World of Ours

    When someone says “assume that a public key cryptosystem
    exists,” this is roughly equivalent to saying “assume
    that you could clone dinosaurs, and that you could fill a park
    with these dinosaurs, and that you could get a ticket to this
    ‘Jurassic Park,’ and that you could stroll throughout this
    park without getting eaten, clawed, or otherwise quantum
    entangled with a macroscopic dinosaur particle.”

  • http://blog.cryptographyengineering.com/2014/08/whats-matter-with-pgp.html

  • http://secushare.org/PGP

  • https://hashicorp.com/blog/vault.html

Getting old school

Academic stuff to read to stay paranoid

  • Genkin, D., Shamir, A., & Tromer, E. (2013). RSA Key Extraction via Low-Bandwidth Acoustic Cryptanalysis. Cryptology ePrint Archive, Report 2013/857, 2013. http://eprint.iacr.org. Online.

    Yes, that’s right, deducing your password by listening to your computer.
    But it gets worse:

    Beyond acoustics, we demonstrate that a similar low-bandwidth attack can be
    performed by measuring the electric potential of a computer chassis.
    A suitably-equipped attacker need merely touch the target computer with his
    bare hand, or get the required leakage information from the ground wires at
    the remote end of VGA, USB or Ethernet cables.

    Maybe don’t read this if you are working on reducing your background paranoia.

  • Roth, A. (2014). The Algorithmic Foundations of Differential Privacy. Now Publishers. Online.

    The mathematical foundations of doing stuff privately.
    I hope someone else is reading this so that I don’t have to.

  • Sarigol, E., Garcia, D., & Schweitzer, F. (2014). Online Privacy as a Collective Phenomenon. arXiv:1409.6197 [cs]. Online.

    Your friends have already disclosed secrets about you by disclosing they know
    you on social media, secrets that will be further disseminated by random grad
    students in Switzerland when the social media company goes bust.

Politics of privacy

Here’s a trick to get yourself to the correct level of consternation:

The Watts Test:

…a simple metric I use to assess the claims put forth by wannabe
surveillers:
simply relocate the argument from cyber- to meatspace, and see how it holds
up.
For example, Leslie Caldwell’s forebodings about online
zones of lawlessness
would be rendered thusly:

Caldwell also raised fresh alarms about curtains on windows and locks on
bathroom doors, both of which officials say make it easier for criminals to
hide their activity. “Bathroom doors obviously were created with good
intentions, but are a huge problem for law enforcement. There are a lot of
windowless basements and bathrooms where you can do anything from purchase
heroin to buy guns to hire somebody to kill somebody”

Practically, first step, I would like to minimise the
amount of information complete strangers get about me for free.
For example, I would prefer the mafia not to be able to buy stuff with my
credit cards, I’d prefer my personal relationships are not used sell crap to
me, I’d prefer not to release those awkward photos from when I had a pony tail.

Broadly, some
stuff I’d like to keep private, some stuff I’d like to share, and some stuff,
I’m happy to share, but only for the right price or with the right organisation;
I want to assign my personal information to the correct publicness categories,
and at a better price point.
And by “better”, I mean, “not selling off the foundations of functional
democracy for all future times to unaccountable interests for a few
dollars a year right now.” which seems a little steep for kitten pictures.

No-opt-out-gamified citizenship: China builds the mother of all online reputation systems

China is proposing to assess its citizens’ behavior over a totality of
commercial and social activities, creating an uber-scoring system.
When completed, the model could encompass everything from a person’s
chat-room comments to their performance at work, while the score could be
used to determine eligibility for jobs, mortgages, and social services.

They’ve been working on the credit system for the financial industry for a
while now,” says Rogier Creemers, a China expert at Oxford University.
“But, in recent years, the idea started growing that if you’re going to
assess people’s financial status, you should equally be able to do that with
other modes of trustworthiness.”

The document talks about the “construction of credibility”—the ability to
give and take away credits—across more than 30 areas of life, from energy
saving to advertising.

Why we live in a dystopia even Orwell coudn’t have envisioned

http://www.alexaobrien.com/secondsight/wb/binney.html

See original: The Living Thing / Notebooks Privacy (notes on how to have it)

Running a secure server

Or at least a somewhat more secure server.

So many parts to this, and I care so little about any of them.

SSL

Nonetheless, a baseline important detail to use modern web services is SSL, a notoriously tedious process.
This recently got easier and cheaper with
Let’s Encrypt
and their client software
letsencruptnosudo
or simp_le

Proxy/privacy/anonymisation servers

Run your own search server?

  • mysearch - Local search engine portal designed to anonymate your search requests and have a better display of search results
    A public instance is available at https://search.jesuislibre.net/
  • searx is the same, I think

Running your own VPN/proxy/anonymizing/p2p etc servers can
be less convenient for the panopticon for other stuff.

Note, however, that virtual machines on someone else’s cloud can never be
especially secure from determined nasty persons or state actors.

See original: The Living Thing / Notebooks Running a secure server

Running a secure server

Or at least a somewhat more secure server.

So many parts to this, and I care so little about any of them.

SSL

Nonetheless, a baseline important detail to use modern web services is SSL, a notoriously tedious process.
This recently got easier and cheaper with
Let’s Encrypt
and their client software
letsencruptnosudo
or simp_le

Proxy/privacy/anonymisation servers

Run your own search server?

  • mysearch - Local search engine portal designed to anonymate your search requests and have a better display of search results
    A public instance is available at https://search.jesuislibre.net/
  • searx is the same, I think

Running your own VPN/proxy/anonymizing/p2p etc servers can
be less convenient for the panopticon for other stuff.

Note, however, that virtual machines on someone else’s cloud can never be
especially secure from determined nasty persons or state actors.

See original: The Living Thing / Notebooks Running a secure server

Prendre les espaces de temps pour maîtriser les impacts diffus générés par les grandes infrastructures de transport terrestre (ITT) sur la biodiversité

Les grandes Infrastructures de Transport Terrestre (ITT) génèrent de multiples impacts sur la biodiversité, depuis les premières transformations du paysage en amont des travaux de construction jusqu’aux effets de la gestion des dépendances vertes en phase d’exploitation. Les travaux scientifiques en road ecology ont permis de spatialiser au gré des recherches la plupart des impacts des ITT sur les milieux naturels et la biodiversité. Dans le contexte territorial français, la loi de 1976 sur les études d’impact, puis les lois Grenelle dans les années 2000 ont établi un cadre réglementaire de plus en plus exigeant. Ce cadre d’études a favorisé l’émergence de nouvelles pratiques d’ingénierie visant à la transparence écologique des ITT suivant la doctrine de l’Evitement-Réduction-Compensation (ERC). Désormais, aménageurs et chercheurs s’interrogent aussi sur la continuité des impacts tout au long des phases de vie des ITT, ainsi que sur les impacts cumulés (mêlant impacts directs, indir...

See original: VertigO - la revue électronique en sciences de l'environnement Prendre les espaces de temps pour maîtriser les impacts diffus générés par les grandes infrastructures de transport terrestre (ITT) sur la biodiversité

R (the language)

R is the current hotness in statistics. I may as well use it, as 2/3 of all
statistical algorithms I’ve run into of late are implemented in it. Of those
that remain, most of the rest are written for MATLAB, which is, IMO, some kind
of weird con job pulled on the maths community by disgruntled scientific
computation graduates who want to double bill you for the use of your own
floating point unit. C, Python and Java seem to vie for 3rd spot,
or possibly one of the sommercial alternatives such as S, SPSS or stata.
There are some disconvertingly enthusiastic persons advocating Julia,
and a few super old school command-line thingies.

Pros and cons

Good

  • combines unparalleled breadth and community, at least as pertains to
    statisticians, data miners, machine learners and other such
    assorted folk as I am pleased to call my colleagues. To get some sense of
    this thriving scene, check out R-bloggers. This community alone is
    enough to sell R, whatever you think of the language
    (cf “Your community
    is your best asset
    “)
    And believe me, I have reservations about everything else.
  • amazing, statistically-useful plotting (cf, e.g., the awful battle to
    get error bars in mayavi)
  • online web-app visualization: shiny

Bad

  • Seems, from my personal aesthetic, to have been written by a team who
    prioritise delivering statistical functionality right now over making an
    elegant, fast or consistent language to access that functionality.
    (“Elegant”, “fast”, “consistent”; you can choose… uh…
    Oh look, it’s lunch break! So what are you doing this weekend?)
    I’d rather access those same beautiful libraries through a language which has
    had as many computer scientists winnowing its ugly bits as Python or Ruby
    has had.
    Or indeed Go, Julia, even javascript has managed to drag itself out of hell
    theses days.
    And, for that matter, I’d like as many amazing third-party
    libraries for non-statistical things as these other languages promise.
  • Poetically, R has random scope amongst other
    parser and syntax weirdness.
  • Call-by-value semantics (in a “big-data” processing language?)
  • …ameliorated not even by array views,
  • …exacerbated by bloaty design
  • Object model tacked on after the fact… in fact, several object models,
    which is fine? I guess? maybe, but…
  • …if the object model stuff is multi-standard compatibility disaster,
    I’d like the trade-off to be speed, or functional design features, or some
    other such modern convenience. Nah.
  • One of the worst names to google for ever (cf Processing, Pure)

Tips

Easy project reload

Devtool for lazy people:

Make a folder called MyCode with a DESCRIPTION file.
Make a subfolder called R.
Put R code in .R files in there.
Edit, load_all(“MyCode”), use the functions.

Functional prog hacks

  • purr “A FP package for R in the spirit of underscore.js”

  • magrittr brings a compose (“pipe”) operator to R:

    %>%
    

split/apply

useful functions: semi_join etc
plyr and dplyr are the essential package.

subsetting hell

To subset a list based object:

x[1]

to subset and optionally downcast the same:

x[[1]]

to subset a matrix-based object:

x[1, , drop=FALSE]

to subset and optionally downcast the same:

x[1]

plotting

ggvis is the latest iteration of the ggplot family, AFAICT.

Pro tip:
It’s worth having an install of R around just for the grammar of graphics packages.

How to pass sparse matrices between R and Python

https://gist.github.com/howthebodyworks/9e89e65bfc58fded46ae

This FS-backed method was a couple of orders of magnitude faster than rpy2 last time I tried to pass more than a few MB of data.

Upgrading R breaks the installed packages

This is the fix:

update.packages(checkBuilt=TRUE, ask=FALSE)

Bioconductor’s horrifyingly pwnable install

In fact, the default package management might not be much better, but the
secondary R package repository makes it terrifyingly clear:

What, you’d like to install some biostatistics software on
your campus supercomputing cluster? Easy! Simply download and run this
unverifiable obligatedly unencrypted unsigned script from a webserver of unknown provenance!

source("http://bioconductor.org/biocLite.R")
biocLite("RBGL")

It is probably usually not often script kiddies spoofing you so as to to trojan
your campus computing cluster to steal CPU cycles. After all,
who would do that?

On an unrelated note, I am looking for investors in a distributed bitcoin
mining operation. Contact me privately.

There are step debuggers and other such modern conveniences

  • inspecting frames post hoc: recover
    In fact, pro-tip, you can invoke it in 3rd party code gracefully:

    options(error = utils::recover)
    
  • Interactive debugger: browser

  • Graphical interactive optionally-web-based debugger available in RStudio and if it had any more buzzwords in it would socially tag your instagram and upload in to the NSA’s Internet Of Things to be 3D printed.

  • easy command-line invocation: Rio —- Loads CSV from stdin into R as a data.frame, executes given commands, and gets the output as CSV or PNG on stdout

R for Pythonistas

Many things about R are surprising to me, coming as I do most recently from
Python. I’m documenting my perpetual surprise here, in order that it may save
someone else the inconvenience of going to all that trouble to be personally surprised.

Opaque imports

Importing an R package, unlike importing a python module, brings in random
cruft that may have little to do with the names of the thing you just imported.
That is, IMO, poor planning, although history indicates that most language
designers don’t agree with me on that:

> npreg
Error: object 'npreg' not found
> library("np")
Nonparametric Kernel Methods for Mixed Datatypes (version 0.40-4)
> npreg
function (bws, ...) #etc

Further, Data structures in R can do, and are intended to, provide first class scopes
for looking up of names. You are, as apt of your explorations into data to
bring the names of columns in a data set into scope just as much as the names
of functions in a library. This is kind of useful, although the scoping
proceedings do make my eyes water when this intersects with function definition.

Formulas are cool and ugly, like Adult Swim, and intimately bound up in the
prior point.

assignment to function calls

I need to learn the R terminology to describe this.

R fosters a style of programming where attributes and metadata of data objects
are set by using accessor functions, e.g. in matrix column naming:

> m=matrix(0, nrow=2,ncol=2)
> m
    [,1] [,2]
[1,]   0   0
[2,]   0   0
> colnames(m)
NULL
> colnames(m)=c('a','b')
> colnames(m)
[1] "a" "b"
> m
     a b
[1,] 0 0
[2,] 0 0

If you want to know by observing its effects whether an apparent function
returns some massaged product of is argument, or whether it decorates the
argument, well, check the manual. As a rule, the accessor functions operate on
one object and return null, although so can, e.g., plotting functions.

No scalar types…

A float is a float vector of size 1:

> 5
[1] 5

…yet verbose vector literal syntax

You makes vectors by using a call to a function called c. Witness:

> c('a', 'b', 'c', 'd')
[1] "a" "b" "c" "d"

If you type a literal vector in though, it will throw an error:

> 'a', 'b', 'c', 'd'
Error: unexpected ',' in "'a',"

I’m sure there are Reasons for this;
it’s just that they are reasons that I don’t care about.

In short,

A powerful, effective, diverse, well-supported nightmare.

OTOH, the as far as statistical languages go, this is wonderful;
The others are less supported, less diverse,
and R is now the de facto standard,
so I count my blessings.

To read

See original: The Living Thing / Notebooks R (the language)

Coarse graining

AFAICT, this is the question ‘how much worse do your predictions get as you discard
information in some orderly fashion?’, as framed by physicists.

Do “renormalisation groups”, whatever they are, fit in here?
How about Scholtes and his time-respecting networks?

Where the coarse gaining is itself a stochastic proces,
is this just a
hierarchical model,
in the statistical sense?

To consider: the algorithmic statistics angle,
the pseudorandomness angle,
the probabilistic angel as exemplified by the suggestive utility of
sigma-algebras and filtrations here.

To read, classics

  • Bar-Yam, Y. (2003). Dynamics Of Complex Systems. Westview Press.
  • Castiglione, P., & Falcioni, M. (2008). Chaos and Coarse Graining in Statistical Mechanics. Cambridge, UK ; New York: Cambridge University Press.
  • NESCI’s multiscale methods page

To read, actually want to

  • Petri, G., Expert, P., Turkheimer, F., Carhart-Harris, R., Nutt, D., Hellyer, P. J., & Vaccarino, F. (2014). Homological scaffolds of brain functional networks. Journal of The Royal Society Interface, 11(101), 20140873. DOI. Online.

    Talks about a fun-sounding “persistent homology” idea, which sounds a little
    like some kind of topological measure theory to my analytics-biassed perspective:

    Persistent homology is a recent technique in computational topology developed for shape recognition and the analysis of high dimensional datasets [36,37].
    It has been used in very diverse fields, ranging from biology [38,39] and sensor network coverage [40] to cosmology [41].
    Similar approaches to brain data [42,43], collaboration data [44] and network structure [45] also exist.
    The central idea is the construction of a sequence of successive approximations of the original dataset seen as a topological space X.
    This sequence of topological spaces \(X_0, X_1, \dots{}, X_N = X\) is such that \(X_i \subseteq X_j\) whenever \(i < j\) and is called the filtration.
    Choosing how to construct a filtration from the data is equivalent to choosing the type of goggles one wears to analyse the data.

See original: The Living Thing / Notebooks Coarse graining