Recommended reading & tools for MSc students (AI/audio/etc)

If you are working with me e.g. for your MSc project, here are some starting points for reading, and for tooling up:

Recommended reading:

Computational Analysis of Sound Scenes and Events - a good textbook from 2018. Chapter 2 is a very good intro to many of the fundamentals in audio processing for machine learning.
Computational bioacoustics with deep learning: a review and roadmap - a very up-to-date review paper by me, for animal sound in particular.
The Good Research Code Handbook - Read this!
Suggested reading: getting going with deep learning - a list of useful reading that our lab members rely on (from 2019).
Probabilistic Machine Learning: An introduction by Kevin Murphy - a very good comprehensive textbook (new edition 2022).

Useful software tools:

I'm assuming you will be using Python, as well as one of the standard deep learning frameworks and/or scikit-learn, and also git to keep track of your code. These are standard (and you'll see some of that in the "Good Research Code Handbook" above). Slightly more specialist:

librosa - a Python library for working with sound files
Sonic Visualiser - a great desktop app to explore/annotate sound files interactively
Pytorch Lightning - you can use plain Pytorch, but Lightning makes a lot of deep learning easier.
Hydra - this helps you to manage the situation when you have multiple variants of a DL model to evaluate. See e.g. this blog for example
Pytorch Hub - you might use a pretrained model from here
Weights and Biases - a tool for keeping track of the outcomes of your experiment, and visualising them nicely
DVC - for keeping track of datasets, and/or machine learning experiments (ideally you know git already, for this). ... Though it seems not many people are keeping track of their datasets formally.
Audio data augmentation:
- audiomentations is a good modern python library for that. Also look at the README, it has lots of detail, as well as (at the end) a list of alternative tools!
- Scaper is an alternative to "ordinary" data augmentation, for the special case when you have short isolated "foreground" sounds, and you want to combine them with longer "background" sound recordings, to create synthetic soundscapes

Thanks to my PhD and MSc students for top tips added to this list!

Fri 21 January 2022 | science | Permalink

mcld.co.uk

Other things on this site...