I'm pleased to announce that we have 2 two-year postdoctoral positions available.
We're looking for early-career researchers who would like to develop AI for sound and image recognition of nature. The positions are here in the beautiful Dutch city of Leiden, as part of projects funded by Horizon Europe. At Naturalis, you will work in a team with experts in both AI and biodiversity, and you will also collaborate with project partners in other European institutions.
Deadline: Sunday 26th June
Please feel free to contact me with any questions! And please forward this to anyone who may be interested.
I'm really excited about the developments going on around me. I'm working with great people on AI for biodiversity and sustainability, in two lovely academic departments in the Netherlands (Tilburg University and Naturalis Biodiversity Centre). You can join! That's part of what's exciting. We have job opportunities!
Assistant professor Artificial Intelligence(Tilburg University). Deadline: April 15, 2022 Software developerworking with the Arise "Digital Species Identification" team (Naturalis)
- Postdoctoral researcher in AI for biodiversity monitoring -- Get in touch with me if you're interested.
To see a bit more about what we're up to, look at the Arise project (Naturalis and others), the Cognitive Science and AI department at Tilburg (here's a sample of the recent research published from CSAI), and the Evolutionary Ecology research group at Naturalis.
Send me a message! Happy to give a little bit of advice or answer questions.
If you are working with me e.g. for your MSc project, here are some starting points for reading, and for tooling up:
- Computational Analysis of Sound Scenes and Events - a good textbook from 2018. Chapter 2 is a very good intro to many of the fundamentals in audio processing for machine learning.
- Computational bioacoustics with deep learning: a review and roadmap - a very up-to-date review paper by me, for animal sound in particular.
- The Good Research Code Handbook - Read this!
- Suggested reading: getting going with deep learning - a list of useful reading that our lab members rely on (from 2019).
- Probabilistic Machine Learning: An introduction by Kevin Murphy - a very good comprehensive textbook (new edition 2022).
Useful software tools:
I'm assuming you will be using Python, as well as one of the standard deep learning frameworks and/or scikit-learn, and also git to keep track of your code. These are standard (and you'll see some of that in the "Good Research Code Handbook" above). Slightly more specialist:
- librosa - a Python library for working with sound files
- Sonic Visualiser - a great desktop app to explore/annotate sound files interactively
- Pytorch Lightning - you can use plain Pytorch, but Lightning makes a lot of deep learning easier.
- Hydra - this helps you to manage the situation when you have multiple variants of a DL model to evaluate. See e.g. this blog for example
- Pytorch Hub - you might use a pretrained model from here
- Weights and Biases - a tool for keeping track of the outcomes of your experiment, and visualising them nicely
- DVC - for keeping track of datasets, and/or machine learning experiments (ideally you know git already, for this). ... Though it seems not many people are keeping track of their datasets formally.
- Audio data augmentation:
- audiomentations is a good modern python library for that. Also look at the README, it has lots of detail, as well as (at the end) a list of alternative tools!
- Scaper is an alternative to "ordinary" data augmentation, for the special case when you have short isolated "foreground" sounds, and you want to combine them with longer "background" sound recordings, to create synthetic soundscapes
Thanks to my PhD and MSc students for top tips added to this list!
This week was the DCASE 2021 workshop, a great workshop with lots of interesting research activity on Detection and Classification of Acoustic Scenes and Events.
Some observations from me:
- The development of "SED" (sound event detection) into "SELD" (sound event localisation and detection) is really welcome. There are lots of applications in which we want to infer the spatial location (or the direction) of the sound sources: robotics, bioacoustic surveying, etc. I saw some high-quality performance, and good development of synthetic training datasets etc.
- There will always be tasks with no spatial information (lots of them!), so it seems likely that both SED and SELD should continue to be refined, in parallel.
- The addition of spatial localisation brings the subject matter even closer to that of our underwater cousin DCLDE (Detection, Classification, Localization, and Density Estimation of Marine Mammals using Passive Acoustics). There's no need to consider "merging" workshops, but perhaps we should have more exchange between these communities.
- I appreciated the focus on small-footprint neural networks which was created by Task 1a's requirement for submitted systems to have a limited number of parameters (limited to 128kb of nonzero parameters). I remain unsure about whether this specific constraint is the best one - what about the size of the model, for example? It could be nice to try something such as applying a total RAM constraint on the entire process. But, still, the challenge encouraged the production on good small-footprint classifiers.
- I am proud of our work on Task 5, "few-shot bioacoustic event detection", of the large team that put it together, and the submitted works! I'm particularly proud because the way we designed the task is extremely closely linked to problems that practitioners in bioacoustics or animal behaviour face, and I think that with a little more development, we can hand them some good useful tools. I believe we have a very good balance: a task that is needed in practice, while also being conceptually interesting for algorithm development. (Here's a quick video overview of the task by Veronica.)
In the "town hall" plenary we discussed some interesting opinions about how to organise DCASE going forward. There was also a very interesting discussion, emerging from the "industry panel" plenary, of privacy and GDPR issues in using sound sensors in public. I'd like to thank the contributors to that discussion - it's a non-trivial issue and so it's very good to hear some well-considered perspectives on this.
You can watch the videos from DCASE 2021 here.
I'm looking forward to DCASE 2022 - in Nancy, France, in November. See you there!
I'm an academic working on AI and Biodiversity - my research is described here. If you're an MSc student at Tilburg University CSAI department (or elsewhere), you could take your project with me. In most cases you will need some deep learning skills, and in most cases you'll be working with natural sounds such as wildlife sound monitoring. Here are some specific topics of interest right now, that you could study:
- Novelty detection for biodiversity images/audio (with an industry partner)
- We use images and sound recordings to detect birds, insects and plants all across the Netherlands. But what if we receive an upload with a species we don’t know about, or an unusual file that needs expert attention? In this project you will work with machine learning methods for anomaly detection, and apply them to biodiversity data. You will help to improve automatic monitoring of biodiversity, in the Netherlands and beyond.
- Detecting birdsong on-device
- Automatic detection of sounds is useful for "wake-up" functionality in a smartphone. It's already used for keyword-spotting in mobile devices. Can we use it to automatically detect a particular birdsong? In this project we would like to develop a phone app that can listen continuously and react when a particular bird species is detected. We know this is possible, and you can solve it using standard deep learning toolkits - but the big challenge will be to run this on-phone (Android, iOS), creating a smart but low-power algorithm that can run on device. In this project you might use existing "keyword spotting" tools, or use toolkits such as TFlite to port an algorithm onto device.
- Classifying rare fish from photos (with an industry partner)
- Our partner runs an app for anglers, who submit photos of the fish they catch, for automatic species ID. Can we use this unique dataset to help monitor the fish biodiversity of the Netherlands? Organisations such as the Waterschaps are obliged to make biodiversity reports to the EU, so would benefit from automatic monitoring technology. However, there is an interesting research question: photos of fish taken out of water are very different from those on automatic underwater cameras.This is an extreme example of "domain adaptation" in machine learning. You could also investigate how to classify rare fish species, in which we only have 1 or 2 live images – can image synthesis, or collections of drawings, help?
- Build a better bird classifier (with an industry partner)
- Warblr is a mobile app for automatic birdsong classification. The company uses machine learning. It has its training dataset, but now it also has many thousands of user-contributed sound recordings (unlabelled). Can you use these data to train a better classifier for the app? The classifier must also fit within the constraints of the running service: no more than 10 million parameters, and no more than 4 seconds to produce a decision. To work on this problem, you might use methods such as self-supervised learning, semi-supervised learning, pretraining, model distillation, or model pruning.
- Birdsong automatic transcription
- For images we have "object detection", and the equivalent for audio is "sound event detection" (SED). But - can we successfully detect all the sound events in a dawn chorus, when many birds are singing early in the morning? We have a dataset of annotated birdsong recordings. SED has been studied using deep learning, but we don't know if it works well enough for dense sound scenes of multiple birds. If we could get this working, we could understand animal behaviour (for example, are the birds taking turns) and also improve biodiversity monitoring
- Perceiver as a model of bat echolocation foraging
- Some bats can recognise their favourite flowers by the sound of the echolocation reflected back to them. We replicated this process using a neural network, in a recent research paper. In the paper, we commented that there may be better ways to handle the sequential analysis of multiple echoes. A recent new neural network called Perceiver seems to be appropriate for this. Can you use it to analyse spectrograms of reflected bat echolocations?
- Optimal updating of a deep learning classifier service
- We deploy deep learning (DL) classifiers, often using convolutional neural networks (CNNs), to recognise animal images and sounds. We also continuously receive new data – some of it labelled, some of it unlabelled. What’s the optimal way to “update” our classifier for best results? We could train it again from scratch; we could "fine tune" it using the new data; we could keep part of the model "frozen" and re-train part of it. And how would we verify that the model was not worse than the previous one? In this project you will design and validate an approach for updating a classifier for use in a live deployed web service, considering theoretical and practical aspects of how to maintain and improve the quality of service.
- Detecting animal sounds to improve animal welfare (with an industry partner)
- We work with a small company that creates a device placed in farms to monitor the health and wellbeing of animals. They already monitor climate conditions and visual changes - but sound could also be a key indicator for animal health issues. Coughs, screams, or "alarm calls" could be detected (in cows, pigs, chickens, and more) to ensure any problems with welfare can be alerted rapidly. In this project you will train an AI algorithm to detect specific animal sounds, using data from a real commercial product, and implemeting the algorithm to run efficiently on a device (Raspberry Pi-based). You will validate the performance of the detector, try out improvements - and there is potential to deploy the system in a live setting, if the results are good.
- Birdsong classification with spectrogram patches
- A recent deep learning paper found that using “patch embeddings” was a powerful method for image classification. It has some similarities with older methods used on patch embeddings of spectrograms for sound classification. In this project you will adapt the recent image-classification work for birdsong classification, and find out if you can create the next generation of powerful birdsong classifier.
- Sound event detection using transformers/perceivers
- There has been recent interest in the "Transformer" and "Perceiver" neural network architecture, originally used for text data but now for images and audio. Can they be used for sound event detection/classification? In this project you will run a detection (or classification) audio task and compare their performance against standard CNN models.
- A simulated dataset for wildlife Sound Event Localisation and Detection
- It is very useful to automatically localise and detect sounds – for example, multiple people speaking in a room. It would also be highly useful for outdoor wildlife monitoring – but there’s a problem: in order to train machine learning, we need a well-annotated training dataset, but it’s very hard to do this for outdoor natural sound recordings, because the sounds are complex and hard to annotate exactly. In this project, you will follow a recent method for creating synthetic SELD datasets but adapt it for outdoor sound. The challenges will be to obtain good sound source material, as well as “impulse responses” for natural reverberation, and to evaluate the naturalness of the synthesised sound recordings. Your work will enable the next generation of intelligent wildlife monitoring.
Also we have INTERNSHIP ideas:
- Internship: “Human perception of bird sounds”
- We have conducted a research study in which we played bird song “syllables” to birds, and asked the birds which sounds were similar to each other. But what do humans think? Would they make the same decisions as birds? In this study you will reproduce our bird song comparison study with human volunteers. More specifically, volunteers hear 3 sounds, let’s cal them A/X/B, and they are asked: does X sound more similar to A or to B? From this study, we will be able to explore how similar or different human sound perception is to birds’ sound perception.
- Internship: "Adding annotation ability to the world’s largest open birdsong database"
- Xeno Canto is the world’s biggest database of birdsong, with 60,000 recordings submitted each year. For data analysis and animal behaviour research, it’s useful to have annotations of the recordings (“who sung when”), but the site doesn’t have a facility for that. In this project you will add this feature. You’ll work with the existing PHP/MySQL website, adding a database table to upload+validate+store CSV annotations, and also perhaps visualise these annotations or provide users with direct editing ability.
Here are some PAST projects I've supervised:
- Voice anonymisation in wildlife sound recordings
- Insect sound classification using deep learning
- Detecting animal sounds to improve animal welfare
- Wildlife sound source separation using deep learning
- Efficient bird sound detection on the Bela embedded system [Paper]
- Short-term Prediction of Power Generated from Photovoltaic Systems using Gaussian Process Regression [Paper]
- Listen like a bat: plant classification using echolocation and deep learning
- Evaluating the impact of Full Spectrum Temporal Convolutional Neural Networks on Bird Species Sound Classification
- Detecting and classifying animal calls
- Estimating & Mitigating the Impact of Acoustic Environments on Digital Audio Signalling [Paper]
Check out the published papers to see some details from the kind of work we do!
Get in touch with me by email, info here. You're welcome to suggest a topic of your own, though to work with me it should concern new deep learning methods and/or animal sounds.
We're pleased to announce a new data challenge: "Few-shot Bioacoustic Event Detection", a new task within the "DCASE 2021" data challenge event.
We challenge YOU to create a system to detect the calls of birds, hyenas, meerkats and more.
This is a "few shot" task, meaning we only ever have a small number of examples of the sound to be detected. This is a great challenge for machine-learning students and researchers: it is not yet solved, and it is great practical utility for scientists and conservationists monitoring animals in the wild.
We are able to launch this task thanks to a great collaboration of people who contributed data from their own projects. These newly-curated datasets are contributed from projects recorded in Germany, USA, Kenya and Poland.
The training and validation datasets are available now to download. You can use them to develop new recognition systems. In June, the test sets will be made available, and participants will submit the results from their systems for official scoring.
Much more information on the Few-shot Bioacoustic Event Detection DCASE 2021 page.
Within TDWG Audubon Core, we are considering what is a good standard to label information in sub-regions of sound recordings, images, etc. For example, I can draw a rectangular box in an image or a spectrogram, and give it a species label. This happens a lot! How can we exchange …
Ever since the immersive experience of the fantastic Biodiversity_Next conference 2019, I've been getting to grips with biodiversity data frameworks such as GBIF and TDWG. So I'm very pleased to tell you that I've been contributing to the Audubon Core standard, which is an open standard of vocabularies to be …
I'm extremely pleased to announce this publication, edited by Jérôme Sueur and myself: Ecoacoustics and Biodiversity Monitoring - a special issue in the journal "Remote Sensing in Ecology & Conservation".
It features 2 reviews and 6 original research articles, from research studies around the globe.
You can also read a brief introduction …
I had a great time at the Biodiversity_next conference, meeting a lot of people involved in Biodiversity informatics. (I was part of an "AI" session discussing the state of the art in deep learning and bioacoustic audio.)
I was glad to get more familiar with the biodiversity information frameworks. GBIF …