I've been struggling with the tension between academia and flying for a long time. The vast majority of my holidays I've done by train and the occasional boat - for example the train from London to southern Germany is a lovely ride, as is London to Edinburgh or Glasgow. But in academia the big issue is conferences and invited seminars - much of the time you don't get to choose where they are, and much of the time there are specific conferences that you "must" be publishing at, or your students "must" be at for their career, or you're invited to give a talk.
What can you do? Well, you can't give up. So here's what I've done, for the past five years at least:
- I've declined various opportunities to fly (e.g. to North America and Australia - I'm Europe based). Sometimes this hurts - there are great meetings that you'd like to be at. In general, though, you usually find there are similar opportunities nearer by. You'll probably meet most of the people in one of those events anyway. In the big picture, it's probably better for academia to be structured as an overlapping patchwork network, rather than having single-point-of-groupthink.
- I've taken the train to many conferences and meetings. From the UK I've taken the train to France, Spain, Germany, Netherlands, and I'm happy to go further. If you haven't done long train journeys for work then maybe you don't realise: with a laptop, many long-distance train journeys are ideal peaceful office days, with a reserved seat and beautiful views scrolling past. (UK folks: ask The Man In Seat 61 for the best train trips.) If your concern is making time for the journey, don't worry! You'll be much more productive than when you fly!
- When invited to fly somewhere, I always discuss lower-carbon ways of doing it. Rome2Rio is a handy site to compare how to get anywhere by different means. If flying is the only way and I'm tempted to accept the invitation, I ask the inviters to pay for carbon offsetting too.
Many university administrations don't want to pay for carbon offsets - why? This needs to change. If they're paying for flights they should be paying for the negative externalities of them. I'm not worried here about whether carbon offsetting is a good excuse or not - I'm concerned about research being more aware of its responsibilities.
- If travelling somewhere (even by train), always try to make the most of the journey by finding other opportunities while out there - e.g. a new research group to say hello to (even if just a cuppa), a company or NGO. It's good to make face-to-face contact because that makes it much easier to do remote collaboration or coordination at other times (with the same people, I mean), reducing the need for extra trips.
(Talking to my German colleagues, I learn that the German finance rules mean you have to travel home as soon as possible after the event, i.e. not roll multiple things into one trip - that's an unfortunate rule, we should change that.)
- And of course I've done plenty of video-conferencing and audio-conferencing. It doesn't replace face-to-face meetings and we should be realistic about that, but it's a tool to use.
There's a cost implication which I haven't mentioned: flights are unfortunately often cheaper than trains and stopovers. This needs to change, of course - and can be a bit tricky when you're invited to speak somewhere and the cost ends up more than the organisers expected. However, I've been managing a funded research project for the past five years and I've noticed that in fact I've spent much less money on travel than I had projected. Why? Well back when I wrote the budget I costed for international flights and so on. But my adapted approach to travel means I take fewer big long-distance trips, but I get more out of them because I combine things into one trip, and I've skipped certain distant meetings in favour of ones closer to home - all of which means the cost is less than it would have been.
By the way, this handy flight CO2 calculator can help to work out the impact of speific trips, including multi-stop trips, so you can calculate if combining flights into a round-trip is sensible.
None of these are absolute rules. We can't carry all the burden solo, and we have to make compromises between different priorities. But if we all make some changes we can adapt academia to current realities. We can do this together - which is why I've signed my name on No Fly Climate Sci, a place for academics collectively to pledge to fly less. As I said, you don't have to be absolute about this, and the No Fly Climate Sci pledge acknowledges that. Join me?
ICEI 2018 special session "Analysis of ecoacoustic recordings: detection, segmentation and classification" - full programme
I'm really pleased about the selection presentations we have for our special session at ICEI2018 in Jena (Germany) 24th-28th September. The session is chaired by Jérôme Sueur and me, and is titled "Analysis of ecoacoustic recordings: detection, segmentation and classification".
Our session is special session S1.2 in the programme and here's a list of the accepted talks:
- AUREAS: a tool for recognition of anuran vocalizations
William E. Gómez, Claudia V. Isaza, Sergio Gómez, Juan M. Daza and Carol Bedoya
- Content description of very-long-duration recordings of the environment
Michael Towsey, Aniek Roelofs, Yvonne Phillips, Anthony Truskinger and Paul Roe
- What male humpback whale song chorusing can and cannot tell us about their ecology: strengths and limitations of passive acoustic monitoring of a vocally active baleen whale
Anke Kügler and Marc Lammers
- Improving acoustic monitoring of biodiversity using deep learning-based source separation algorithms
Mao-Ning Tuanmu, Tzu-Hao Lin, Joe Chun-Chia Huang, Yu Tsao and Chia-Yun Lee
- Acoustic sensor networks and machine learning: scalable ecological data to advance vidence-based conservation
Matthew McKown and David Klein
- Extracting information on bat activities from long-term ultrasonic recordings through sound separation
Chia-Yun Lee, Tzu-Hao Lin and Mao-Ning Tuanmu
- Information retrieval from marine soundscape by using machine learning-based source separation
Tzu-Hao Lin, Tomonari Akamatsu, Yu Tsao and Katsunori Fujikura
- A Novel Set of Acoustic Features for the Categorization of Stridulatory Sounds in Beetles
Carol Bedoya, Eckehard Brockerhoff, Michael Hayes, Richard Hofstetter, Daniel Miller and Ximena Nelson
- Noise robust 2D bird localization via sound using microphone arrays
Daniel Gabriel, Ryosuke Kojima, Kotaro Hoshiba, Katsutoshi Itoyama, Kenji Nishida and Kazuhiro Nakadai
- Fine-scale observations of spatiotemporal dynamics and vocalization type of birdsongs using microphone arrays and unsupervised feature mapping
Reiji Suzuki, Shinji Sumitani, Naoaki Chiba, Shiho Matsubayashi, Takaya Arita, Kazuhiro Nakadai and Hiroshi Okuno
- Articulating citizen science, automatic classification and free web services for long-term acoustic monitoring: examples from bat monitoring schemes in France and UK
Yves Bas, Kevin Barre, Christian Kerbiriou, Jean-Francois Julien and Stuart Newson
We also have poster presentations on related topics:
- Towards truly automatic bird audio detection: an international challenge
- Assessing Ecosystem Change using Soundscape Analysis
Diana C. Duque-Montoya, Claudia Isaza and Juan M. Daza
- MatlabHTK: a simple interface for bioacoustic aanalyses using hidden Markov models
- MAAD, a rational unsupervised method to estimate diversity in ecoacoustic recordings
Juan Sebastian Ulloa, Thierry Aubin, Sylvain Haupert, Chloé Huetz, Diego Llusia, Charles Bouveyron and Jerome Sueur
- Underwater acoustic habitats: towards a toolkit to assess acoustic habitat quality
Irene Roca and Ilse Van Opzeeland
- Focus on geophony: what weather sounds can tell
Roberta Righini and Gianni Pavan
- Reverse Wavelet Interference Algorithm for Detection of Avian Species and Characterization of Biodiversity
Sajeev C Rajan, Athira K and Jaishanker R
- Automatic Bird Sound Detection: Logistic Regression Based Acoustic Occupancy Model
Yi-Chin Tseng, Bianca Eskelson and Kathy Martin
- A software detector for monitoring endangered common spadefoot toad populations
Guillaume Dutilleux and Charlotte Curé
- PylotWhale a python package for automatically annotating bioacoustic recordings
Maria Florencia Noriega Romero Vargas, Heike Vester and Marc Timme
You can register for the conference here - early discount until 15th Sep. See you there!
This week we've been at the LVA-ICA 2018 conference, at the University of Surrey. A lot of papers presented on source separation. Here are some notes:
- Evrim Acar gave a great tutorial on tensor factorisation. Slides here
- Hiroshi Sawada described a nice extension of "joint diagonalisation", applying it in synchronised fashion across all frequency bands at once. He also illustrated well how this method reduces to some existing well-known methods, in certain limiting cases.
- Ryan Corey showed his work on helping smart-speaker devices (such as Alexa or whatever) to estimate the relative transfer function which helps with multi-microphone sound processing. He made use of the wake-up keywords that are used for such devices ("Hi Marvin" etc), taking advantage of the known content to estimate the RTF for "free" i.e. with no extra interaction. He DTW-aligned the spoken keyword against a dictionary, then used that to mask the recorded sound and estimate the RTF.
- Stefan Uhlich presented their (Sony group's) strongly-performing SiSEC sound separation method. Interestingly, they use a variant of DenseNet, as well as a BLSTM, to estimate a tf mask. Stefan also said that once the estimates have been made, a crucial improvement was to re-estimate them by putting the estimated masks together through a multichannel Wiener filtering stage.
- Ama Marina Kreme presented her new task of "phase inpainting" and methods to solve it - estimating a missing portion of phases in a spectrogram, when all of the magnitudes and some of the phases are known. I can see this being useful in post-processing of source separation outputs, though her application was in engine noise analysis with an industrial collaborator.
- Lucas Rencker presented some very nice ideas in "consistent dictionary learning" for signal declipping. Here, "consistent" means that the reconstructed signal should be painting the missing regions in a way that matches the clipping - if some part of the signal was clipped at a maximum of X, then its reconstruction should take values greater than or equal to X. Here's his Python code of the declipping method. Apparently also the state-of-the-art in this task is a method called "A-SPADE" by Kitic (2015). Pavel Zaviska presented an analysis of A-SPADE and S-SPADE, improving the latter but not beating A-SPADE.
An interesting feature of the week was the "SiSEC" Signal Separation Evaluation Challenge. We saw posters of some of the methods used to separate musical recordings into their component stems, but even better, we were used as guinea-pigs, doing a quick listening test to see which methods we thought were giving the best results. In most SiSEC work this is evaluated using computational measures such as signal-to-distortion ratio (SDR), but there's quite a lot of dissatisfaction with these "objective" measures since there's plenty that they get wrong. At the end of LVA-ICA the organisers announced the results of the listening test: surprisingly or not, the results of the listening test had broadly a strong correlation with the SDR measures, though there were some tracks for which this didn't hold. More analysis of the data to come, apparently.
From our gang, my students Will and Delia presented their posters and both went really well. Here's the photographic evidence:
- Delia Fano Yela's poster about source separation using graph theory and Kernel Additive Modelling read the preprint here
- Will Wilkinson's poster "A Generative Model for Natural Sounds Based on Latent Force Modelling" read the preprint here
Also from our research group (though not working with me) Daniel Stoller presented a poster as well as a talk, getting plenty of interest for his deep learning methods for source separation preprint here.
The paper "Wasserstein Learning of Deep Generative Point Process Models" published at the NIPS 2017 conference has some interesting ideas in it, connecting generative deep learning - which is mostly used for dense data such as pixels - together with point processes, which are useful for "spiky" timestamp events.
They use the Wasserstein distance (aka the "earth-mover's distance") to compare sequences of spikes, and they do acknowledge that this has advantages and disadvantages. It's all about pushing things around until they match up - e.g. move a spike a few seconds earlier in one sequence, so that it lines up with a spike in the other sequence. It doesn't nicely account for insertions or deletions, which is tricky because it's quite common to have "missing" spikes for added "clutter" in data coming from detectors, for example. It'd be better if this method could incorporate more general "edit distances", though that's non-trivial.
So I was thinking about distances between point processes. More reading to be done. But a classic idea, and a good way to think about insertions/deletions, is called "thinning". It's where you take some data from a point process and randomly delete some of the events, to create a new event sequence. If you're using Poisson processes then thinning can be used for example to sample from a non-stationary Poisson process, essentially by "rejection sampling" from a stationary one.
Thinning is a probabilistic procedure: in the simplest case, take each event, flip a coin, and keep the event only if the coin says heads. So if we are given one event sequence, and a specification of the thinning procedure, we can define the likelihood that this would have produced any given "thinned" subset of events. Thus, if we take two arbitrary event sequences, we can imagine their union was the "parent" from which they were both derived, and calculate a likelihood that the two were generated from it. (Does it matter if the parent process actually generated this union list, or if there were unseen "extra" parent events that were actually deleted from both? In simple models where the thinning is independent for each event, no: the deletion process can happen in any order, and so we can assume those common deletions happened first to take us to some "common ancestor". However, this does make it tricky to compare distances across different datasets, because the unseen deletions are constant multiplicative factors on the true likelihood.)
We can thus define a "thinning distance" between two point process realisations as the negative log-likelihood under this thinning model. Clearly, the distance depends entirely on the number of events the two sequences have in common, and the numbers of events that are unique to them - the actual time positions of the events has no effect, in this simple model, it's just whether they line up or not. It's one of the simplest comparisons we can make. It's complementary to the Wasserstein distance which is all about time-position and not about insertions/deletions.
This distance boils down to:
NLL = -( n1 * log(n1/nu) + n2 * log(n2/nu) + (nu-n1) * log(1 - n1/nu) + (nu-n2) * log(1 - n2/nu) )
where "n1" is the number of events in seq 1, "n2" in seq 2, and "nu" in their union.
Does this distance measure work? Yes, at least in limited toy cases. I generated two "parent" sequences (using the same rate for each) and separately thinned each one ten times. I then measured the thinning distance between all pairs of the child sequences, and there's a clear separation between related and unrelated sequences:
Distances between distinct children of same process: Min 75.2, Mean 93.3, Median 93.2, Max 106.4 Distances between children of different processes: Min 117.3, Mean 137.7, Median 138.0, Max 167.3
This is nice because easy to calculate, etc. To be able to do work like in the paper I cited above, we'd need to be able to optimise against something like this, and even better, to be able to combine it into a full edit distance, one which we can parameterise according to situation (e.g. to balance the relative cost of moves vs. deletions).
This idea of distance based on how often the spikes coincide relates to "co-occurrence metrics" previously described in the literature. So far, I haven't found a co-occurrence metric that takes this form. To relax the strict requirement of events hitting at the exact same time, there's often some sort of quantisation or binning involved in practice, and I'm sure that'd help for direct application to data. Ideally we'd generalise over the possible quantisations, or use a jitter model to allow for the fact that spikes might move.
I'm lucky to be working with a great set of PhD students on a whole range of exciting topics about sound and computation. (We're based in C4DM and the Machine Listening Lab.) Let me give you a quick snapshot of what my students are up to!
I'm primary supervisor for Veronica and Pablo:
Veronica is working on deep learning techniques for jointly identifying the species and the time-location of bird sounds in audio recordings. A particular challenge is the relatively small amount of labelled data available for each species, which forces us to pay attention to how the network architecture can make best use of the data and the problem structure.
- A paper by Veronica (not deep learning, that paper; it's on its way)
Pablo is using a mathematical framework called Gaussian processes as a new paradigm for automatic music transcription - the idea is that it can perform high-resolution transcription and source separation at the same time, while also making use of some sophisticated "priors" i.e. information about the structure of the problem domain. A big challenge here is how to scale it up to run over large datasets.
I'm joint-primary supervisor for Will and Delia:
Will is developing a general framework for analysing sounds and generating new sounds, combining subband/sinusoidal analysis with probabilistic generative modelling. The aim is that the same model can be used for sound types as diverse as footsteps, cymbals, dog barks...
Delia is working on source separation and audio enhancement, using a lightweight framework based on nonlocal median filtering, which works without the need for large training datasets or long computation times. The challenge is to adapt and configure this so it makes best use of the structure of the data that's implicitly there within an audio recording.
I'm secondary supervisor for Jiajie and Sophie:
Jiajie is studying how singers' pitch tuning is affected when they sing together. She has designed and conducted experiments with two or four singers, in which sometimes they can all hear each other, sometimes only one can hear the other, etc. Many singers or choir conductors have their own folk-theories about what affects people's tuning, but Jiajie's experiments are making scientific measurements of it.
Sophie is exploring how to enhance a sense of community (e.g. for a group of people living together in a housing estate) through technological interventions that provide a kind of mediated community awareness. Should inhabitants gather around the village square or around a Facebook group? Those aren't the only two ways!
I'm just flying from the International Bioacoustics Congress 2017, held in Haridwar in the north of India. It was a really interesting time. I'm glad that IBAC was successfully brought to India, i.e. to a developing country with a more fragmented bioacoustics community (I think!) than in the west. For me, getting to know some of the Indian countryside, the people, and the food was ace. Let me make a few notes about research themes that were salient to me:
- "The music is not in the notes, but in the silence between" - this Mozart quote which Anshul Thakur used is a lovely byline for his studies - as well as some studies by others - on using the durations of the gaps between units in a birdsong, for the purposes of classification/analysis. Here's who investigated gaps:
- Anshul Thakur howed how he used the gap duration as an auxiliary feature, alongside the more standard acoustic classification, to improve quality.
- Florencia Noriega discussed her own use of gap durations, in which she fitted a Gaussian mixture model to the histogram of log-gap-durations between parrot vocalisation units, and then used this to look for outliers. One use that she suggested was that this could be a good way to look for unusual vocalisation sequences that could be checked out in more detail by further fieldwork.
- (I have to note here, although I didn't present it at IBAC, that I've also been advocating the use of gap durations. The clearest example is in our 2013 paper in JMLR in which we used them to disentangle sequences of bird sounds.)
- Tomoko Mizuhara presented evidence from a perceptual study in zebrafinches, that the duration of the gap preceding a syllable exhibits some role in the perception of syllable identity. The gap before? Why? - Well one connection is that the gap before an event might relate if it's the time the bird takes to breathe in, and thus there's an empirical correlation, whether the bird is using that purely empirically or for some more innate reason.
Machine learning methods in bioacoustics - this is the session that I organised, and I think it went well - I hope people found it useful. I won't go into loads of detail here since I'm mostly making notes about things that are new to me. One notable thing though was Vincent Lostanlen announcing a dataset "BirdVox-70k" (flight calls of birds recorded in the USA, annotated with the time and frequency of occurrence) - I always like it when a dataset that might be useful for bird sound analysis is published under an open licence! No link yet - I think that's to come soon. (They've also done other things such as this neat in-browser audio annotation tool.)
Software platforms for bioacoustics. When I do my research I'm often coding my own Python scripts or suchlike, but that's not a vernacular that most bioacousticians speak. It's tricky to think what the ideal platform for bioacoustics would be, since there are quite some demands to meet: for example ideally it could handle ten seconds of audio as well as one year of audio, yet also provide an interface suitable for non-programmers. A few items on that theme:
- Phil Eichinski (standing in for Paul Roe) presented QUT's Eco-Sounds platform. They've put effort into making it work for big audio data, managing terabytes of audio and optimising whether to analyse the sound proactively or on-demand. The false-colour "long duration spectrograms" developed by Michael Towsey et al are used to visualise long audio recordings. (I'll say a bit more about that below.)
- Yves Bas presented his Tadarida toolbox for detection and classification.
- Ed Baker presented his BioAcoustica platform for archiving and analysing sounds, with a focus on connecting deposits to museum specimens and doing audio query-by-example.
- Anton Gradisek, in our machine-learning session, presented "JSI Sound: a machine learning tool in Orange for classification of diverse biosounds" - this is a kind of "machine-learning as a service" idea.
- Then a few others that might or might not be described as full-blown "platforms":
- Tomas Honaiser wasn't describing a new platform, but his monitoring work - I noted that he was using the existing AMIBIO project to host and analyse his recordings.
- Sandor Zsebok presented his Ficedula matlab toolbox which he's used for segmenting and clustering etc to look at cultural evolution in the Collared flycatcher.
- Julie Elie mentioned her lab's SoundSig Python tools for audio analysis.
- Oh by the way, what infrastructure should these projects be built upon? The QUT platform is built using Ruby, which is great for web developers but strikes me as an odd choice because very few people in bioacoustics or signal processing have even heard of it - so how is the team / the community going to find the people to maintain it in future? (EDIT: here's a blog article with background information that the QUT team wrote in response to this question.) Yves Bas' is C++ and R which makes sense for R users (fairly common in this field). BioAcoustica - not sure if it's open-source but there's an R package that connects to it. --- I'm not an R user, I much prefer Python, because of its good language design, its really wide user base, and its big range of uses, though I recognise that it doesn't have the solid stats base that R does. People will debate the merits of these tools for ever onwards - we're not going to all come down on one decision - but it's a question that I often come back to, how best to build software tools to ensure they're useable and maintainable and solid.
So about those false-colour "long duration spectrograms". I've been advocating this visualisation method ever since I saw Michael Towsey present it (I think at the Ecoacoustics meeting in Paris). Just a couple of months ago I was at a workshop at the University of Sussex and Alice Eldridge and colleagues had been playing around with it too. At IBAC this week, ecologist Liz Znidersic talked really interestingly about how she had used them to detect a cryptic (i.e. hard-to-find) bird species. It shows that the tool helps with "needle in a haystack" problems, including those where you might not have a good idea of what needle you're looking for.
In Liz's case she looked at the long-duration spectrograms manually, to spot calling activity patterns. We could imagine automating this, i.e. using the long-dur spectrogam as a "feature set" to make inferences about diurnal activity. But even without automation it's still really neat.
Anyway back to the thematic listings...
- Trills in bird sounds are fascinating. These rapidly-frequency-modulated sounds are often difficult and energetic to do, and this seems to lead to them being used for specific functions.
- Tereza Petruskova presented a poster of her work on tree pipits, arguing for different roles for the "loud trill" and the "soft trill" in their song.
- Christina Masco spoke about trills in splendid fairywrens (cute-looking birds those!). They can be used as a call but can also be included at the end of a song, which raises the question of why are they getting "co-opted" in this way. Christina argued that the good propagation properties of the trill could be a reason - there was some discussion about differential propagation and the "ranging hypothesis" etc.
- Ole Larsen gave a nice opening talk about signal design for private vs public messages. It was all well-founded, though I quibbled his comment that strongly frequency-modulated sounds would be for "private" communication because if they cross multiple critical bands they might not accumulate enough energy in a "temporal integration window" to trigger detection. This seems intuitively wrong to me (e.g.: sirens!) but I need to find some good literature to work this one through.
- Hybridisation zones are interesting for studying birdsong, since they're zones where two species coexist and individuals of that species might or might not breed with individuals of the other species. For birds, song recognition might play a part in whether this happens. It's quite a "strong" concept of perceptual similarity, to ask the question "Is that song similar enough to breed with?"!
- Alex Kirschel showed evidence from a suboscine (and so not a vocal learner) which in some parts of Africa seems to hybridise and in some parts seems not to - and there could be some interplay with the similarity of the two species' song in that region.
- Irina Marova also talked about hybridisation, in songbirds, but I failed to make a note of what she said!
- Duetting in birdsong was discussed by a few people, including Pedro Diniz and Tomasz Osiejuk. Michal Budka argued that his playback studies with Chubb's cisticola showed they use duet for territory defence and signalling commitment but not for "mate-guarding".
- Oh and before the conference, I was really taken by the duetting song of the grey treepie, a bird we heard up in the Himalayan hills. Check it out if you can!
As usual, my apologies to anyone I've misrepresented. IBAC has long days and lots of short talks (often 15 minutes), so it can all be a bit of a whirlwind! Also of course this is just a terribly partial list.
(PS: from the archives, here's my previous blog about IBAC 2015 in Murnau, Germany.)
In the early twentieth century when the equations of quantum physics were born, physicists found themselves in a difficult position. They needed to interpret what the quantum equations meant in terms of their real-world consequences, and yet they were faced with paradoxes such as wave-particle duality and "spooky action at …
This season, I'm lead organiser for two special conference sessions on machine listening for bird/animal sound: EUSIPCO 2017 in Kos, Greece, and IBAC 2017 in Haridwar, India. I'm very happy to see the diverse selection of work that has been accepted for presentation - the diversity of the research itself …
People love to take the vegans down a peg or two. I guess they must unconsciously agree that the vegans are basically correct and doing the right thing, hence the defensive mud-slinging.
There's a bullshit article "Being vegan isn’t as good for humanity as you think". Like many bullshit …
People who do technical work with sound use spectrograms a heck of a lot. This standard way of visualising sound becomes second nature to us.
As you can see from these photos, I like to point at spectrograms all the time:
(Our research group even makes some really nice software …