Other things on this site...


Reading list: excellent papers for birdsong and machine learning

I'm happy to say I'm now supervising two PhD students, Pablo and Veronica. Veronica is working on my project all about birdsong and machine learning - so I've got some notes here about recommended reading for someone starting on this topic. It's a niche topic but it's fascinating: sound in general is fascinating, and birdsong in particular is full of many mysteries, and it's amazing to explore these mysteries through the craft of trying to get machines to understand things on our behalf.

If you're thinking of starting in this area, you need to get acquainted with: (a) birds and bird sounds; (b) sound/audio and signal processing; (c) machine learning methods. You don't need to be expert in all of those - a little naivete can go a long way!

But here are some recommended reads. I don't want to give a big exhaustive bibliography of everything that's relevant. Instead, some choice reading that I have selected because I think it satisfies all of these criteria: each paper is readable, is relevant, and is representative of a different idea/method that I think you should know. They're all journal papers, which is good because they're quite short and focused, but if you want a more complete intro I'll mention some textbooks at the end.

  • Briggs et al (2012) "Acoustic classification of multiple simultaneous bird species: A multi-instance multi-label approach"

    • This paper describes quite a complex method but it has various interesting aspects, such as how they detect individual bird sounds and how they modify the classifier so that it handles multiple simultaneous birds. To my mind this is one of the first papers that really gave the task of bird sound classification a thorough treatment using modern machine learning.
  • Lasseck (2014) "Large-scale identification of birds in audio recordings: Notes on the winning solution of the LifeCLEF 2014 Bird Task"

    • A clear description of one of the modern cross-correlation classifiers. Many people in the past have tried to identify bird sounds by template cross-correlation - basically, taking known examples and trying to detect if the shape matches well. The simple approach to cross-correlation fails in various situations such as organic variation of sound. The modern approach, introduced to bird classification by Gabor Fodor in 2013 and developed further by Lasseck and others, uses cross-correlation, but it doesn't use it to guess the answer, it uses it to generate new data that gets fed into a classifier. At time of writing (2015), this type of classifier is the type that tends to win bird classification contests.
  • Wang (2003), "An industrial strength audio search algorithm"

    • This paper tells you how the well-known "Shazam" music recognition system works. It uses a clever idea about what is informative and invariant about a music recording. The method is not appropriate for natural sounds but it's interesting and elegant.

      Bonus question: Take some time to think about why this method is not appropriate for natural sounds, and whether you could modify it so that it is.

  • Stowell and Plumbley (2014), "Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning"

    • This is our paper about large-scale bird species classification. In particular, a "feature-learning" method which seems to work well. There are some analogies between our feature-learning method and deep learning, and also between our method and template cross-correlation. These analogies are useful to think about.
  • Lots of powerful machine learning right now uses deep learning. There's lots to read on the topic. Here's a blog post that I think gives a good introduction to deep learning. Also, for this article DO read the comments! The comments contain useful discussion from some experts such as Yoshua Bengio. Then after that, this recent Nature paper is a good introduction to deep learning from some leading experts, which goes into more detail while still at the conceptual level. When you come to do practical application of deep learning, the book "Neural Networks: Tricks of the Trade" is full of good practical advice about training and experimental setup, and you'll probably get a lot out of the tutorials for the tool you use (for example I used Theano's deep learning tutorials).

    • I would strongly recommend NOT diving in with deep learning until you have spent at least a couple of months reading around different methods. The reason for this is that there's a lot of "craft" to deep learning, and a lot of current-best-practice that changes literally month by month, and anyone who gets started could easily spend three years tweaking parameters.
  • Theunissen and Shaevitz (2006), "Auditory processing of vocal sounds in birds"

    • This one is not computer science, it's neurology - it tells you how birds recognise sounds!

      A question for you: should machines listen to bird sounds in the same way that birds listen to bird sounds?

  • O'Grady and Pearlmutter (2006), "Convolutive non-negative matrix factorisation with a sparseness constraint"

    • An example of analysing a spectrogram using "non-negative matrix factorisation" (NMF), which is an interesting and popular technique for identifying repeated components in a spectrogram. NMF is not widely used for bird sound, but it certainly could be useful, maybe for feature learning, or for decoding, who knows - it's a tool that anyone analysing audio spectrograms should be aware of.
  • Kershenbaum et al (2014), "Acoustic sequences in non-human animals: a tutorial review and prospectus"

    • A good overview from a zoologist's perspective on animal sound considered as sequences of units. Note, while you read this, that sequences-of-units is not the only way to think about these things. It's common to analyse animal vocalisations as if they were items from an alphabet "A B A BBBB B A B C", but that way of thinking ignores the continuous (as opposed to discrete) variation of the units, as well as any ambiguity in what constitutes a unit. (Ambiguity is not just failure to understand: it's used constructively by humans, and probably by animals too!)
  • Benetos et al (2013), "Automatic music transcription: challenges and future directions"

    • This is a good overview of methods used for music transcription. In some ways it's a similar task to identifying all the bird sounds in a recording, but there are some really significant differences (e.g. the existence of tempo and rhythmic structure, the fact that musical instruments usually synchronise in pitch and timing whereas animal sounds usually do not). A big difference from "speech recognition" research is that speech recognition generally starts from the idea of there just being one voice. The field of music transcription has spent more time addressing problems of polyphony.
  • Domingos (2012), "A few useful things to know about machine learning"

    • lots of sensible, clearly-written advice for anyone getting involved in machine learning.


  • "Machine learning: a probabilistic perspective" by Murphy
  • "Nature's Music: the Science of Birdsong" by Marler and Slabbekoorn - a great comprehensive textbook about bird vocalisations.
| science | Permalink