Breakbeat cuts
Wed, 15/07/2015  7:21am  by dan mackinlaySlicing up your percussion line into mad junglist syncopations is a whole world of its own.
Asides from selling a lot of vinyl, it has attracted significant academic interest.
Think of group theory angle, like a Rubik’s cube.
Is it a pure group theoretic problem?
Or are there additional constraints on a breakbeat cut such that it is still considered rhythmic?
Nick Collins has done a whole lot of work here.
 The “Amen Break”: The Most Famous 6Second Drum Loop & How It Spawned a Sampling Revolution
 Thebreaks is a collaborative archive of whosampledwhom, which is not always breakbeat cuts, but often enough dammit.
 Adamo, M. (2010). The Breakbeat Bible: The Fundamentals of Breakbeat Drumming (Pap/Com edition.). S.l.: Hudson Music.
 Collins, N. (2002). Interactive Evolution of Breakbeat Cut Sequences. In Proceedings of Cybersonica. London.
 Hockman, J. (2014). An ethnographic and technological study of breakbeats in hardcore, jungle and drum & bass. McGill University.
See original: Breakbeat cuts
Earthquakes
Tue, 14/07/2015  1:44pm  by dan mackinlayA passing interst of mine, caught from Didier Sornette when we was my supervisor.
I’m mostly interested in the selfexciting process model of Ogata and Ozaki et al, but I’ll also accept notes on human tragedy and normal accidents.
KATHRYN SCHULZ at the New Yorker
The Really Big One
To see the full scale of the devastation when that tsunami recedes, you would
need to be in the international space station. The inundation zone will be
scoured of structures from California to Canada. The earthquake will have
wrought its worst havoc west of the Cascades but caused damage as far away as
Sacramento, California—as distant from the worsthit areas as Fort Wayne,
Indiana, is from New York. FEMA expects to coördinate searchandrescue
operations across a hundred thousand square miles and in the waters off four
hundred and fiftythree miles of coastline. As for casualties: the figures I
cited earlier—twentyseven thousand injured, almost thirteen thousand
dead—are based on the agency’s official planning scenario, which has the
earthquake striking at 9:41 A.M. on February 6th. If, instead, it strikes in
the summer, when the beaches are full, those numbers could be off by a
horrifying margin.
See original: Earthquakes
Samuel(YGS)
Mon, 13/07/2015  1:20am  by IsuokochesamuelSamuel is a best student at NDA university
Astronomy
Mon, 13/07/2015  1:14am  by IsuokochesamuelStudying the stars,planets and their movement.Become a planet and star master,study with perfect professionals and get high degree
Data sets
Sun, 12/07/2015  7:15pm  by dan mackinlaySee also musical corpora.

Zenodo, for example, documents many published scientific data set

SESHAT: The Seshat: Global History Databank brings together the most current and comprehensive body of knowledge about human history in one place. Our unique Databank systematically collects what is currently known about the social and political organization of human societies and how civilizations have evolved over time.

UCI datasets
are diverse. Here’s a nice one:
 Buzz prediction in online social media

This dataset contains two different social networks: Twitter, a microblogging platform with exponential growthand extremely fast dynamics, and Tom’s Hardware, a worldwide forum network focusing on new technology with more conservative dynamics but distinctive features.


Leskovec lab

 Yang, J. Leskovec. Temporal Variation in Online Media. ACM International Conference on Web Search and Data Mining (WSDM ‘11), 2011.

467 million Twitter posts from 20 million users covering a 7 month period from June 1 2009 to December 31 2009. We estimate this is about 2030% of all public tweets published on Twitter during the particular time frame.
As per request from Twitter the data is no longer available.

The Higgs dataset has been built after monitoring the spreading processes on Twitter before, during and after the announcement of the discovery of a new particle with the features of the elusive Higgs boson on 4th July 2012. The messages posted in Twitter about this discovery between 1st and 7th July 2012 are considered.


Quandl has some databases.

CSRP has some too?  perhaps accessible to me via Wharton?
See original: Data sets
About YGS
Fri, 10/07/2015  10:27pm  by IsuokochesamuelAm a young growing scientist,am from Nigeria western part of African continent,i have a group called young growing scientists(YGS),I CREATED this group to make people come together and study science.When i saw this association i was very happy to see this organisation it is a kind of thing i like studying with my fellow scientists and also talk about science positively.Please my coscientists lets join our hands together and make YGS group good and standard in this association so people will know it more and also like to join and make the world better.ISU OKOCHE SAMUEL(YGS) SAY:WITH SCIENCE ALL THINGS ARE EASY
Academic publishing
Tue, 07/07/2015  12:17pm  by dan mackinlaySome practical notes to the connection between reproducibility, academic publishing and… whatever.

Zenodo “is an open dependable home for the longtail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science.”
 Research. Shared. — all research outputs from across all fields of science are welcome!
 Citeable. Discoverable. — uploads gets a Digital Object Identifier (DOI) to make them easily and uniquely citeable…
 Flexible licensing — because not everything is under Creative Commons.
 Safe — your research output is stored safely for the future in same cloud infrastructure as research data from CERN’s Large Hadron Collider.
A major win is the easy DOIlinking of data and code for reproducible research. (for free)

Cameron Neylon (spelling corrections are mine):
[…]another way to look at engaging with peer review is as costly signalling. The purpose of submitting work to peer review is to signal that the underlying content is “honest” in some sense. In the mating dance between researchers and funders […]the peer review process is intended to make the pure signalling of publication and […]harder to fake. Taking Fisher’s view of mutual selection, authors on one side, funders and [institutions] on the other, we can see, at least as analogy, a reason for the [runaway] selection for publishing in prestigious journals[:] A runaway process where the signalling [bears] a tenuous relationship with the underlying qualities being sought, in the same way as the size of the peacock’s tail has a [tenuous] link with its health and fitness.
(I think I have successfully reconstructed the intended quote through the typographical errors.)

Open Conference Systems (OCS)
“is a free Web publishing tool that will create a complete Web presence for your scholarly conference. OCS will allow you to: create a conference Web site
 compose and send a call for papers
 electronically accept paper and abstract submissions
 allow paper submitters to edit their work
 post conference proceedings and papers in a searchable format
 post, if you wish, the original data sets
 register participants
 integrate postconference online discussions
See original: Academic publishing
Why are cancer cases increasing?
Mon, 06/07/2015  11:23am  by Nkumbu Nawc SikaonaOur global changing world and social structure is making us pay. Global warming, GMO products are among these. If we don't act together as one then we are doomed. Just invented the first ever Cancer treatment tablet machine maker and liquid syrup, if this research can be published we will surely head towards development.
Point processes
Fri, 03/07/2015  2:33pm  by dan mackinlayAnother current obsession, tentatively placemarked.
I’ve just spent 6 months thinking about nothing else, so I won’t write much here.
Reading
 Adelfio, G., & Schoenberg, F. P.(2009). Point process diagnostics based on weighted secondorder statistics and their asymptotic properties. Annals of the Institute of Statistical Mathematics, 61(4), 929–948. DOI.
 AlOsh, M. A., & Alzaid, A. A.(1987). FirstOrder IntegerValued Autoregressive (INAR(1)) Process. Journal of Time Series Analysis, 8(3), 261–275. DOI.
 Anselin, L., Cohen, J., Cook, D., Gorr, W., & Tita, G. (n.d.). Spatial analyses of crime.
 Bacry, E., & Muzy, J.F. (2014). Hawkes model for price and trades highfrequency dynamics. Quantitative Finance, 14(7), 1147–1166. DOI.
 Baddeley, A. (2007). Spatial Point Processes and their Applications. In W. Weil (Ed.), Stochastic Geometry (pp. 1–75). Springer Berlin Heidelberg.
 Baddeley, A. J., & Lieshout, M. N. M. van. (1995). Areainteraction point processes. Annals of the Institute of Statistical Mathematics, 47(4), 601–619. DOI.
 Baddeley, A. J., Lieshout, M. N. M. V., & Møller, J. (1996). Markov Properties of Cluster Processes. Advances in Applied Probability, 28(2), 346–355. DOI.
 Baddeley, A. J., Møller, J., & Waagepetersen, R. (2000). Non and semiparametric estimation of interaction in inhomogeneous point patterns. Statistica Neerlandica, 54(3), 329–350. DOI.
 Baddeley, A., & Møller, J. (1989). NearestNeighbour Markov Point Processes and Random Sets. International Statistical Review / Revue Internationale de Statistique, 57(2), 89–121. DOI.
 Baddeley, A., & Turner, R. (2000). Practical Maximum Pseudolikelihood for Spatial Point Patterns. Australian & New Zealand Journal of Statistics, 42(3), 283–322. DOI.
 Baddeley, A., & Turner, R. (2006). Modelling Spatial Point Patterns in R. In A. Baddeley, P. Gregori, J. Mateu, R. Stoica, & D. Stoyan (Eds.), Case Studies in Spatial Point Process Modeling (pp. 23–74). Springer New York.
 Baddeley, A., Gregori, P., Mateu, J., Stoica, R., & Stoyan, D. (2006). Case studies in spatial point process modeling (Vol. 185). Springer.
 Baddeley, A., Møller, J., & Pakes, A. G.(2008). Properties of residuals for spatial point processes. Annals of the Institute of Statistical Mathematics, 60(3), 627–649.
 Baddeley, A., Turner, R., Møller, J., & Hazelton, M. (2005). Residual analysis for spatial point processes (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(5), 617–666. DOI.
 Brix, A., & Kendall, W. S.(2002). Simulation of cluster point processes without edge effects. Advances in Applied Probability, 34(2), 267–280. DOI.
 Brown, B. M., & Hewitt, J. I.(1975). Inference for the Diffusion Branching Process. Journal of Applied Probability, 12(3), 588–594. DOI.
 Brown, E., Barbieri, R., Ventura, V., Kass, R., & Frank, L. (2002). The timerescaling theorem and its application to neural spike train data analysis. Neural Computation, 14(2), 325–346. DOI.
 Böckenholt, U. (1998). Mixed INAR(1) Poisson regression models: Analyzing heterogeneity and serial dependencies in longitudinal count data. Journal of Econometrics, 89(1–2), 317–338. DOI.
 Caballero, M. E., & Chaumont, L. (2006). Conditioned Stable Lévy Processes and the Lamperti Representation. Journal of Applied Probability, 43(4), 967–983.
 Chang, C., & Schoenberg, F. P.(2008). Testing separability in multidimensional point processes with covariates. Annals of the Institute of Statistical Mathematics.
 Chang, Y.P. (2001). Estimation of Parameters for Nonhomogeneous Poisson Process: Software Reliability with ChangePoint Model. Communications in Statistics  Simulation and Computation, 30(3), 623–635. DOI.
 Chen, L. H. Y., & Xia, A. (2011). Poisson process approximation for dependent superposition of point processes. Bernoulli, 17(2), 530–544. DOI.
 Cheng, T., & Wicks, T. (2014). Event Detection using Twitter: A SpatioTemporal Approach. PLoS ONE, 9(6), e97807. DOI.
 Cui, Y., & Lund, R. (2009). A new look at time series of counts. Biometrika, 96(4), 781–792. DOI.
 Daley, D. J., & VereJones, D. (2003). An introduction to the theory of point processes (2nd ed., Vol. 1. Elementary theory and methods). New York: Springer.
 Daley, D. J., & VereJones, D. (2008). An introduction to the theory of point processes (2nd ed., Vol. 2. General theory and structure). New York: Springer.
 Daneshmand, H., GomezRodriguez, M., Song, L., & Schoelkopf, B. (2014). Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Softthresholding Algorithm. arXiv:1405.2936 [physics, Stat].
 Dassios, A., & Zhao, H. (2011). A dynamic contagion process. Advances in Applied Probability, 43(3), 814–846. DOI.
 DíazAvalos, C., Juan, P., & Mateu, J. (2012). Similarity measures of conditional intensity functions to test separability in multidimensional point processes. Stochastic Environmental Research and Risk Assessment, 27(5), 1193–1205. DOI.
 Embrechts, P., Liniger, T., & Lin, L. (2011). Multivariate Hawkes processes: an application to financial data. Journal of Applied Probability, 48A, 367–378. DOI.
 Feigin, P. D.(1976). Maximum Likelihood Estimation for ContinuousTime Stochastic Processes. Advances in Applied Probability, 8(4), 712–736. DOI.
 Filimonov, V., & Sornette, D. (2013). Apparent criticality and calibration issues in the Hawkes selfexcited point process model: application to highfrequency financial data (SSRN Scholarly Paper No. ID 2371284). Rochester, NY: Social Science Research Network.
 Gallo, S., & Leonardi, F. G.(2014). Nonparametric statistical inference for the context tree of a stationary ergodic process. arXiv:1411.7650 [math, Stat].
 Garcia, J. M. G.(2011). A fixedpoint algorithm to estimate the Yule–Simon distribution parameter. Applied Mathematics and Computation, 217(21), 8560–8566. DOI.
 Geyer, C. J., & Møller, J. (1994). Simulation procedures and likelihood inference for spatial point processes. Scandinavian Journal of Statistics, 359–373.
 Giesecke, K., & Schwenkler, G. (2011). Filtered Likelihood for Point Processes (SSRN Scholarly Paper No. ID 1898344). Rochester, NY: Social Science Research Network.
 Giesecke, K., Kakavand, H., & Mousavi, M. (2008). Simulating point processes by intensity projection. In Simulation Conference, 2008. WSC 2008. Winter (pp. 560–568). DOI.
 Giesecke, K., Kakavand, H., & Mousavi, M. (2011). Exact Simulation of Point Processes with Stochastic Intensities. Operations Research, 59(5), 1233–1245. DOI.
 Goulard, M., Särkkä, A., & Grabarnik, P. (1996). Parameter estimation for marked Gibbs point processes through the maximum pseudolikelihood method. Scandinavian Journal of Statistics, 365–379.
 Hardiman, S. J., & Bouchaud, J.P. (2014). Branchingratio approximation for the selfexciting Hawkes process. Physical Review E, 90(6), 062807. DOI.
 Harte, D. (2010). PtProcess: an R package for modelling marked point processes indexed by time. Journal of Statistical Software, 35(8), 1–32.
 Haslinger, R., Pipa, G., & Brown, E. (2010). Discrete Time Rescaling Theorem: Determining Goodness of Fit for Discrete Time Statistical Models of Neural Spiking. Neural Computation, 22(10), 2477–2506. DOI.
 Hawkes, A. G.(1971). Point spectra of some mutually exciting point processes. Journal of the Royal Statistical Society. Series B (Methodological), 33(3), 438–443.
 Hawkes, A. G.(1971). Spectra of some selfexciting and mutually exciting point processes. Biometrika, 58(1), 83–90. DOI.
 Horváth, L. (2001). Changepoint detection in longmemory processes. Journal of Multivariate Analysis, 78(2), 218–234.
 Huang, F., & Ogata, Y. (1999). Improvements of the Maximum PseudoLikelihood Estimators in Various Spatial Statistical Models. Journal of Computational and Graphical Statistics, 8(3), 510–530. DOI.
 Häggström, O., Van Lieshout, M.C. N., & Møller, J. (1999). Characterization results and Markov chain Monte Carlo algorithms including exact simulation for some spatial point processes. Bernoulli, 5(4), 641–658.
 Iribarren, J. L., & Moro, E. (2011). Branching dynamics of viral information spreading. Physical Review E, 84(4), 046116. DOI.
 Jensen, J. L., & Møller, J. (1991). Pseudolikelihood for Exponential Family Models of Spatial Point Processes. The Annals of Applied Probability, 1(3), 445–461.
 Kaulakys, B., Ruseckas, J., Gontis, V., & Alaburda, M. (2006). Nonlinear stochastic models of noise and powerlaw distributions. Physica A: Statistical Mechanics and Its Applications, 365(1), 217–221. DOI.
 Kroese, D. P., & Botev, Z. I.(2013). Spatial process generation. arXiv:1308.0399 [stat].
 Kwieciński, A., & Szekli, R. (1996). Some Monotonicity and Dependence Properties of SelfExciting Point Processes. The Annals of Applied Probability, 6(4), 1211–1231.
 Latour, A. (1998). Existence and Stochastic Structure of a Nonnegative Integervalued Autoregressive Process. Journal of Time Series Analysis, 19(4), 439–455. DOI.
 Lewis, E., Mohler, G., Brantingham, P. J., & Bertozzi, A. L.(2012). Selfexciting point process models of civilian deaths in Iraq. Security Journal, 25(3), 244–264. DOI.
 Li, Z. (2012). Continuousstate branching processes. arXiv:1202.3223 [math].
 Martin, J. S., Jasra, A., & McCoy, E. (2013). Inference for a class of partially observed point process models. Annals of the Institute of Statistical Mathematics, 65(3), 413–437. DOI.
 Massey, W. A., Parker, G. A., & Whitt, W. (1996). Estimating the parameters of a nonhomogeneous Poisson process with linear rate. Telecommunication Systems, 5(2), 361–388. DOI.
 McCauley, J. L., Bassler, K. E., & Gunaratne, G. H.(2008). Martingales, nonstationary increments, and the efficient market hypothesis. Physica A: Statistical and Theoretical Physics, 387(15), 3916–3920. DOI.
 McKenzie, E. (1986). Autoregressive MovingAverage Processes with NegativeBinomial and Geometric Marginal Distributions. Advances in Applied Probability, 18(3), 679–705. DOI.
 McKenzie, E. (1988). Some ARMA Models for Dependent Sequences of Poisson Counts. Advances in Applied Probability, 20(4), 822–835. DOI.
 Mohler, G. O., Short, M. B., Brantingham, P. J., Schoenberg, F. P., & Tita, G. E.(2011). Selfexciting point process modeling of crime. Journal of the American Statistical Association, 106(493), 100–108. DOI.
 Morimoto, T. (1963). Markov Processes and the HTheorem. Journal of the Physical Society of Japan, 18(3), 328–331. DOI.
 Møller, J., & Berthelsen, K. K.(2012). Transforming spatial point processes into Poisson processes using random superposition. Advances in Applied Probability, 44(1), 42–62. DOI.
 Møller, J., & Rasmussen, J. G.(2006). Approximate Simulation of Hawkes Processes. Methodology and Computing in Applied Probability, 8(1), 53–64. DOI.
 Møller, J., & Waagepetersen, R. P.(2007). Modern Statistics for Spatial Point Processes. Scandinavian Journal of Statistics, 34(4), 643–684. DOI.
 Nastić, A. S., Ristić, M. M., & Bakouch, H. S.(2012). A combined geometric INAR(p) model based on negative binomial thinning. Mathematical and Computer Modelling, 55(5–6), 1665–1672. DOI.
 Neustifter, B., Rathbun, S. L., & Shiffman, S. (2012). MixedPoisson Point Process with PartiallyObserved Covariates: Ecological Momentary Assessment of Smoking. Journal of Applied Statistics, 39(4), 883–899. DOI.
 Ogata, Y. (1978). The asymptotic behaviour of maximum likelihood estimators for stationary point processes. Annals of the Institute of Statistical Mathematics, 30(1), 243–261. DOI.
 Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association, 83(401), 9–27. DOI.
 Ogata, Y. (1999). Seismicity analysis through pointprocess modeling: a review. Pure and Applied Geophysics, 155(24), 471–507. DOI.
 Ogata, Y., & Akaike, H. (1982). On linear intensity models for mixed doubly stochastic Poisson and selfexciting point processes. Journal of the Royal Statistical Society, Series B, 44, 269–274. DOI.
 Ogata, Y., Matsu’ura, R. S., & Katsura, K. (1993). Fast likelihood computation of epidemic type aftershocksequence model. Geophysical Research Letters, 20(19), 2143–2146. DOI.
 Ozaki, T. (1979). Maximum likelihood estimation of Hawkes’ selfexciting point processes. Annals of the Institute of Statistical Mathematics, 31(1), 145–155. DOI.
 Paninski, L. (2004). Maximum likelihood estimation of cascade pointprocess neural encoding models. Network: Computation in Neural Systems, 15(4), 243–262. DOI.
 PougetAbadie, J., & Horel, T. (2015). Inferring Graphs from Cascades: A Sparse Recovery Framework. In Proceedings of The 32nd International Conference on Machine Learning.
 Priesemann, V., Munk, M. H., & Wibral, M. (2009). Subsampling effects in neuronal avalanche distributions recorded in vivo. BMC Neuroscience, 10(1), 40. DOI.
 Rasmussen, J. G.(2011, January). Temporal point processes the conditional intensity function.
 Rasmussen, J. G.(2013). Bayesian inference for Hawkes processes. Methodology and Computing in Applied Probability, 15(3), 623–642. DOI.
 Rasmussen, J. G., Møller, J., Aukema, B. H., Raffa, K. F., & Zhu, J. (2006). Bayesian inference for multivariate point processes observed at sparsely distributed times. Department of Mathematical Sciences, Aalborg University.
 Rasmussen, J. G., Møller, J., Aukema, B. H., Raffa, K. F., & Zhu, J. (2007). Continuous time modelling of dynamical spatial lattice data observed at sparsely distributed times. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4), 701–713. DOI.
 Ripley, B. D., & Kelly, F. P.(1977). Markov Point Processes. Journal of the London Mathematical Society, s215(1), 188–192. DOI.
 Ristić, M. M., Nastić, A. S., & Bakouch, H. S.(2012). Estimation in an IntegerValued Autoregressive Process with Negative Binomial Marginals (NBINAR(1)). Communications in Statistics  Theory and Methods, 41(4), 606–618. DOI.
 Rosser, G., & Cheng, T. (2014). A selfexciting point process model for predictive policing: implementation and evaluation.
 Rubin, I. (1972). Regular point processes and their detection. IEEE Transactions on Information Theory, 18(5), 547–557. DOI.
 Saichev, A., & Sornette, D. (2011). Generating functions and stability study of multivariate selfexcited epidemic processes. arXiv:1101.5564 [condMat, Physics:physics].
 Schoenberg, F. (1999). Transforming spatial point processes into Poisson processes. Stochastic Processes and Their Applications, 81(2), 155–164. DOI.
 Schoenberg, F. P.(2002). On Rescaled Poisson Processes and the Brownian Bridge. Annals of the Institute of Statistical Mathematics, 54(2), 445–457. DOI.
 Schoenberg, F. P.(2004). Testing Separability in SpatialTemporal Marked Point Processes. Biometrics, 60(2), 471–481.
 Schoenberg, F. P.(n.d.). Testing separability in multidimensional point processes. Biometrics, 60, 471–481.
 Silva, I., & Silva, M. E.(2006). Asymptotic distribution of the Yule–Walker estimator for INAR processes. Statistics & Probability Letters, 76(15), 1655–1663. DOI.
 Smith, A., & Brown, E. (2003). Estimating a statespace model from point process observations. Neural Computation, 15(5), 965–991. DOI.
 Tria, F., Loreto, V., Servedio, V. D. P., & Strogatz, S. H.(2013). The dynamics of correlated novelties. arXiv:1310.1953 [physics], 4. DOI.
 Veen, A., & Schoenberg, F. P.(2008). Estimation of Space–Time Branching Process Models in Seismology Using an EM–Type Algorithm. Journal of the American Statistical Association, 103(482), 614–624. DOI.
 VereJones, D., & Schoenberg, F. P.(2004). Rescaling Marked Point Processes. Australian & New Zealand Journal of Statistics, 46(1), 133–143. DOI.
 Wheatley, S. (2013, July). Quantifying endogeneity in market prices with point processes: methods & applications. Masters Thesis, ETH Zürich.
See original: Point processes
Parallel computing
Wed, 01/07/2015  6:28pm  by dan mackinlayFashion dictates this should be “cloud” computing, although I’m interested in using the same methods without a cloud.
Let’s say, I need to take notes on “easy sharednothing parallel computing”.
I get lost in all the options for parallel computing on the cheap.
I’m gonna summarise for myself here.
Additional material to this theme under scientific computation workflow and stream processing
Emphasis for now is on embarrassingly parallel computation, which is what I as a statistician mostly do. Mostly in python, sometimes in other things.
That is, I run many calculations/simulations with absolutely no shared state and aggregate them in some way at the end.
Good scientific python VM images: (To work out  should I be listing Docker container images instead?)
 Continuum Analytics has conda python images
 StarCluster is an academicallytargeted AWScompatible multicomputing library. Their VMs are a little bit dated for my purposes, lacking LLVM3.5 etc.
But how to use ‘em?
 Dato (formally Graphlab) claims to automate this stuff.
 They support hadoop too
 Spark distributes over clusters automagically. It has several cluster modes
Standalone – a simple cluster manager included with Spark that makes it easy to set up a private cluster.
Apache Mesos – a general cluster manager that can also run Hadoop MapReduce and service applications.
Hadoop YARN – the resource manager in Hadoop 2.
basic EC2 launch scripts make it easy to launch a standalone cluster on Amazon EC2.
Interesting application: see CommunicationEfficient Distributed Dual Coordinate Ascent
(CoCoA):By leveraging the primaldual structure of these optimization problems,
COCOA effectively combines partial results from local computation while
avoiding conflict with updates simultaneously computed on other machines.
In each round, COCOA employs steps of an arbitrary dual optimization
method on the local data on each machine, in parallel. A single update
vector is then communicated to the master node.Uses cunning optimisation stunts to do efficient distribution of
optimisation problems over various machines.
See original: Parallel computing
Randomised algorithms
Tue, 30/06/2015  10:36am  by dan mackinlaySacrificing precision/certainty for speed, using randomness.
See also related and/or special cases compressed sensing, Monte Carlo methods, particle filters, and random features, stochastic gradient descent.
 BlinkDB BlinkDB is a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. It allows users to tradeoff query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. To achieve this, BlinkDB uses two key ideas: (1) An adaptive optimization framework that builds and maintains a set of multidimensional samples from original data over time, and (2) A dynamic sample selection strategy that selects an appropriately sized sample based on a query’s accuracy and/or response time requirements[…]
 Probabilistic data structures, e.g.
 Bloom filters
 Countmin sketch
 for go
See original: Randomised algorithms
How to do academia
Sun, 28/06/2015  11:41am  by dan mackinlayStupid Git Tricks
Fri, 26/06/2015  11:18am  by dan mackinlaypublishing to github
ghpimport p _build/html/
subtrees
creatin’:
git fetch remote branch git subtree add prefix=subdir remote branch
updatin’:
git fetch remote branch git subtree pull prefix=subdir remote branch git subtree push prefix=subdir remote branch
garbage collecting
In brief, this will purge a lot of stuff from a constipated repo in emergencies:
git reflog expire expire=now all git gc prune=now
See original: Stupid Git Tricks
Calendars and contact databases, digital, use thereof
Thu, 25/06/2015  9:36am  by dan mackinlayGoogle calendar, iCloud etc exist.
I don’t use them.
Gifting your entire personal schedule
and confidential contact data to third parties of dubious motivation
demonstrates a touching sense of the general benevolence of the world toward you in
particular.
However, sadly, this rainbowsnunicorns worldview entails
a disregard for the personal privacy
and beliefs of those of
your contacts who are not convinced that the apparatus of state and capital is
at their personal disposal.
Jargon to know here: CalDAV and CardDAV
are the de facto standards to sync your calendar and contact information,
respectively.
All you need is a server which talks those standards and
you can use whatever client you’d like.
So! Running your own.
 OSX Server.app (closed source) will install Apple’s opensource
Calendar Server. (Does contacts too)  OSX/Windows/Linux: Radicale seems to be much easier if you don’t
want to pay for the magic installer.
See original: Calendars and contact databases, digital, use thereof