Machine listening (musical)

Printer-friendly version

See also
musical corpora,
musical metrics,
synchronisation,
speech recognition.

Machine listening: machine learning, for music.
Everything from that damn shazam app, to teaching computers to recognise speech, to doing artsy shit with sound.
I’m mostly concerned with the third one.

Here are some options for doing it:

  • LibROSA I have been using a lot recently, and I highly recommend it,
    especially if your pipeline already includes python.
    Sleek minimal design, with a curated set of algorithms
    (compare and contrast with the chaos of the vamp plugins ecosystem).
    Python-based, but fast because it uses the fast numpy numerical libraries.
    The API design meshes well with Scikit-learn,
    the de facto python statistics standard, and it’s flexible and hackable.

    • see also talkbox for a nice-looking but abandoned (?) alternative,
      which is nonetheless worth it for Alexander Schindler’s
      lovely MIR lecture based around it.
  • echonest was used to generate the Million Songs Database.
    It’s nice, but partially proprietary.

  • SonicAnnotator is under active development;
    it’s mostly about cobbling together “vamp plugins” for batch analysis.
    That is more steps that I want in an already clunky workflow in the current current projects
    It’s also more about RDF ontologies where I want matrices of floats.

  • If you use a lot of Supercollider, you might like SCMIR, a native supercollider thingy.
    It has the virtues that

    • it can run in realtime, which is lovely.
    • comes with lots of neato bells and whistles, like the author’s quirky breakbeat cut library.

    It has the vices that

    • It runs in Supercollider, which is a bit of a backwater language unserviced by modern development infrastructure, or decent machine learning libraries, and
    • a fraught development process;
      I can’t even link directly to it because the author doesn’t provide it its own anchor tag, let alone a whole web page or source code repository.
      Release schedule is opaque and sporadic.
      Consequently, it is effectively a lone guy’s pet project, rather than an active community endeavour.
      That is to say, if this code were a sweater, it’s the kind you would get from Etsy.

    If on balance this sounds like a good deal to you, you can download SCMIR
    from somewhere or other on Nick Collins‘ homepage.

  • If you are feeling the need of realtime support you might want to look at
    wekinator, a realtime
    interface to the classic machine learning tool weka, or GRT, a modern
    toolkit. Both integrate with music software via OSC and support gesture
    recognition as well as sound (I think?)
    Weikinator seems to have fallen into disrepair of late, though,
    and is a bit quirky.

  • For C++ and Python there is Essentia, as seen in Freesound,
    which is a high recommendation IMO.
    (Watch out, the source download is enormous; just shy of half a gigbyte.)
    Features python and vamp integration, and a great many algorithms.
    I haven’t given it a fair chance because LibROSA has been such a joy to use.
    However, the intriguing
    Dunya
    project is based off it.

  • echonest provdes machine listening as a servers,
    and enables very sophisticated hacks such as autocanonisation.

  • Hey, here is a neat hard-to-classify project: Keyfinder. Very plush lookin’
    student project that classifies things by musical key, not to mention all the
    steps along the way - it can visualise chord structures, melodies and key
    changes too.

  • John Glover, soundcloud staffer,
    has several analysis libraries culminating in Metamorph,

    a new open source library for performing high-level sound transformations based on a sinusoids plus noise plus transients model. It is written in C++, can be built as both a Python extension module and a Csound opcode, and currently runs on Mac OS X and Linux.

    It is designed to work primarily on monophonic, quasi-harmonic sound sources and can be used in a non-real-time context to process pre-recorded sound files or can operate in a real-time (streaming) mode.

    See also the related spectral modelling and sytnthsis package, smstools.

For my part, I find it congenial to use python and supercollider together for my tricky offline computations and supercollider for the more “live”, realtime stuff;
this feels like it gets me best of each of those worlds, and especially of the development communities. YMMV.

Spectral peak tracking

Read

See original: The Living Thing / Notebooks Machine listening (musical)