I am a research fellow, conducting research into automatic analysis of bird sounds using machine learning.
—> Click here for more about my research.
People who do technical work with sound use spectrograms a heck of a lot. This standard way of visualising sound becomes second nature to us.
As you can see from these photos, I like to point at spectrograms all the time:
(Our research group even makes some really nice software for visualising sound which you can download for free.)
It's helpful to transform sound into something visual. You can point at it, you can discuss tiny details, etc. But sometimes, the spectrogram becomes a stand-in for listening. When we're labelling data, for example, we often listen and look at the same time. There's a rich tradition in bioacoustics of presenting and discussing spectrograms while trying to tease apart the intricacies of some bird or animal's vocal repertoire.
But there's a question of validity. If I look at two spectrograms and they look the same, does that mean the sounds actually sound the same?
In strict sense, we already know that the answer is "No". Us audio people can construct counterexamples pretty easily, in which there's a subtle audio difference that's not visually obvious (e.g. phase coherence). But it could perhaps be even worse than that: similarities might not just be easier or harder to spot, they might actually be different. If we have a particular sound X, it could audibly be more similar to A than B, while visually it could be more similar to B than A. If this was indeed true, we'd need to be very careful about performing tasks such as clustering sounds or labelling sounds while staring at their spectrograms.
So - what does the research literature say? Does it give us guidance on how far we can trust our eyes as a proxy for our ears? Well, it gives us hints but so far not a complete answer. Here are a few relevant factoids which dance around the issue:
Really, what does this all tell us? It tells us that looking at spectrograms and listening to sounds are different in so many myriad ways that we definitely shouldn't expect the fine details to match up. We can probably trust our eyes for broad-brush tasks such as labelling sounds that are quite distinct, but for the fine-grained comparisons (which we often need in research) one should definitely be careful, and use actual auditory perception as the judge when it really matters. How to know when this is needed? Still a question of judgment, in most cases.
My thanks go to Trevor Agus, Michael Mandel, Rob Lachlan, Anto Creo and Tony Stockman for examples quoted here, plus all the other researchers who kindly responded with suggestions.
Know any research literature on the topic? If so do email me - NB there's plenty of literature on the accuracy of looking or of listening in various situations; here the question is specifically about comparisons between the two modalities.
My blog has been running for more than a decade, using the same cute-but-creaky old software made by my chum Sam. It was a lo-fi PHP and MySQL blog, and it did everything I needed. (Oh and it suited my stupid lo-fi blog aesthetics too, the clunky visuals are entirely my fault.)
Now, if you were starting such a project today you wouldn't use PHP and you wouldn't use MySQL (just search the web for all the rants about those technologies). But if it isn't broken, don't fix it. So it ran for 10 years. Then my annoying web provider TalkTalk messed up and lost all the databases. They lost all ten years of my articles. SO. What to do?
Well, one thing you can do is simply drop it and move on. Make a fresh start. Forget all those silly old articles. Sure. But I have archivistic tendencies. And the web's supposed to be a repository for all this crap anyway! The web's not just a medium for serving you with Facebook memes, it's meant to be a stable network of stuff. So, ideal would be to preserve the articles, and also to prevent link rot, i.e. make sure that the URLs people have been using for years will still work...
So, job number one, find your backups. Oh dear. I have a MySQL database dump from 2013. Four years out of date. And anyway, I'm not going back to MySQL and PHP, I'm going to go to something clean and modern and ideally Python-based... in other words Pelican. So even if I use that database I'm going to have to translate it. So in the end I found three different sources for all my articles:
That got me almost everything. I think the only thing missing is one blog article from a month ago.
Next step: once you've rescued your data, build a new blog. This was easy because Pelican is really nice and well-documented too. I even recreated my silly old theme in their templating system. I thought I'd have problems configuring Pelican to reproduce my old site, but it's basically all done, even the weird stuff like my separate "recipes" page which steals one category from my blog and reformats it.
Now how to prevent linkrot? The Pelican pages have URLs like "/blog/category/science.html" instead of the old "/blog/blog.php?category=science", and if I'm moving away from PHP then I don't really want those PHP-based links to be the ones used in future. I need to catch people who are going to one of those old links, and point them straight at the new URLs. The really neat thing is that I could use Pelican's templating system to output a little lookup table, a CSV file listing all the URL rewrites needed. Then I write a tiny little PHP script which uses that files and emits HTTP Redirect messages. ........... and relax. a URL like http://www.mcld.co.uk/blog/blog.php?category=science is back online.
Last year I took part in the Dagstuhl seminar on Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR). Many fascinating discussions with phoneticians, roboticists, and animal behaviourists (ethologists).
One surprisingly difficult topic was to come up with a basic data model for describing multi-party interactions. It was so easy to pick a hole in any given model: for example, if we describe actors taking "turns" which have start-times and end-times, then are we really saying that the actor is not actively interacting when it's not their turn? Do conversation participants really flip discretely between an "on" mode and an "off" mode, or does that model ride roughshod over the phenomena we want to understand?
I was reminded of this modelling question when I read this very interesting new journal article by a Japanese research group: "HARKBird: Exploring Acoustic Interactions in Bird Communities Using a Microphone Array". They have developed this really neat setup with a portable microphone array attached to a laptop which does direction-estimation and decodes which birds are heard from which direction. In the paper they use this to help annotate the time-regions in which birds are active, a bit like on/off model I mentioned above. Here's a quick sketch:
From this type of data, Suzuki et al calculate a measure called the transfer entropy which quantifies the extent to which one individual's vocalisation patterns contain information that predicts the patterns of another. It gives them a hypothesis test for whether one particular individual affects another, in a network: who is listening to whom?
That's a very similar question to the question we were asking in our journal article last year, "Detailed temporal structure of communication networks in groups of songbirds". I talked about our model at the Dagstuhl event. Here I'll merely emphasise that our model doesn't use regions of time, but point-like events:
So our model works well for short calls, but is not appropriate for data that can't be well-described via single moments in time (e.g. extended sounds that aren't easily subdivided). The advantage of our model is that it's a generative probabilistic model: we're directly estimating the characteristics of a detailed temporal model of the communication. The transfer-entropy method, by contrast, doesn't model how the birds influence each other, just detects whether the influence has happened.
I'd love to get the best of both worlds. a generative and general model for extended sound events influencing one another. It's a tall order because for point-like events, we have point process theory; for extended events I don't think the theory is quite so well-developed. Markov models work OK but don't deal very neatly with multiple parallel streams. The search continues.
A colleague pointed out this new review paper in the journal "Animal Behaviour": Applications of machine learning in animal behaviour studies.
It's a useful introduction to machine learning for animal behaviour people. In particular, the distinction between machine learning (ML) and classical statistical modelling is nicely described (sometimes tricky to convey that without insulting one or other paradigm).
The use of illustrative case studies is good. Most introductions to machine learning base themselves around standard examples predicting "unstructured" outcomes such as house prices (i.e. predict a number) or image categories (i.e. predict a discrete label). Two of the three case studies (all of which are by the authors themselves) similarly are about predicting categorical labels, but couched in useful biological context. It was good to see the case study relating to social networks and jackdaws. Not only because it relates to my own recent work with colleagues (specifically: this on communication networks in songbirds and this on monitoring the daily activities of jackdaws - although in our case we're using audio as the data source), but also because it shows an example of using machine learning to help elucidate structured information about animal behaviour rather than just labels.
The paper is sometimes mathematically imprecise: it's incorrect that Gaussian mixture models "lack a global optimum solution", for example (it's just that the global optimum can be hard to find). But the biggest omission, given that the paper was written so recently, is any real mention of deep learning. Deep learning has been showing its strengths for years now, and is not yet widely used in animal behaviour but certainly will be in years to come; researchers reading a review of "machine learning" should really come away with at least a sense of what deep learning is, and how it sits alongside other methods such as random forests. I encourage animal behaviour researchers to look at the very readable overview by LeCun et al in Nature.
Last year, when I took part in the Dagstuhl workshop on Vocal Interactivity in-and-between Humans, Animals and Robots, we had a brainstorming session, fantasising about how advanced robots might help us with animal behaviour research. "Spy" animals, if you will. Imagine a robot bird or a robot chimp, living as part of an ecosystem, but giving us the ability to modify its behaviour and study what happens. If you could send a spy to live among a group of animals, sharing food, communicating, collaborating, imagine how much you could learn about those animals!
So it particularly makes me smile to see the BBC nature doc Spy in the Wild, in which they've... gone there and done it already.
--- Well, not quite. It's a great documentary, some really astounding footage that makes you think again about what animals' inner lives are like. They use animatronic "spy" animals with film cameras in, which let them get up very close, to film from the middle of an animal's social group. These aren't autonomous robots though, they're remotely operated, and they're not capable of the full range of an animal's behaviours. They're pretty capable though: in order both to blend in and to interact, the spies can do things such as adopt submissive body language - crouching, ear movements, mouth movements, etc. And...
...some of them vocalise too. Yes there's some vocal interaction between animals and (human-piloted) robots. The vocal interaction is at a pretty simple level, it seems some of the robots have one or two pre-recorded calls built in and triggered by the operator, but it's interesting to see some occasional vocal back-and-forth between the animals and their electrical counterparts.
There are obviously some limitations. The spies generally can't move fast or dramatically. The spy birds can't fly. But - maybe soon?
In the mean time, watch the programme, it has loads of great moments caught on film.
If you're looking for a New Year's resolution how about this one: make more eye contact with strangers.
I was reading this powerful little list of Twenty Lessons from the 20th Century by some Professor of History. One idea that struck me is a very simple one:
11: Make eye contact and small talk. This is not just polite. It is a way to stay in touch with your surroundings, break down unnecessary social barriers, and come to understand whom you should and should not trust.
In a large city like the one I live in, eye contact and small talk are rare. They're even rarer thanks to smartphones, of course - although, twenty years ago, Londoners were still avoiding each other, but using newspapers, novels and Gameboys instead. Anyway I do think smartphones create a mode of interaction which reduces incidental eye contact etc.
So I decided to take the advice. Over the past month or so I took those little opportunities - at the bus stop, at the pedestrian crossing, at the supermarket. A bit of eye contact, a few words about the traffic or whatever. I was surprised how many opportunities for effortless (and not awkward!) tiny bits of smalltalk there were and how worthwhile it was to take them. After the year we've had, this is a little tweak you can try, and who knows, it might help.
I've been cooking vegetarian in 2016. It's about climate change: meat-eating is a big part of our carbon footprint, and it's something we can change. So here I'm sharing some of the best veggie recipes I found this year. Most of them are not too complex, the point is everyday meals not dinner parties.
Note: you don't have to go full-vegan - phew. You can do meat-free Mondays, you can try Veganuary, you can give up beef, or whatever, it all makes a difference. It's true that vegans have the smallest carbon footprint but it's pretty unlikely we're all going to go that far, and a more vegetarian diet makes a big improvement. (Here's an article with some data about that...)
So here we go, the best vegetarian recipes of 2016 - as judged by a meat-eater! ;)
These are all ones that were new discoveries. Of course there's plenty of standard stuff too. Anyway - pick a recipe, give it a go.
The Twelvetrees Ramp is open! It's the "missing link" in the walk down the River Lea from the Olympic Park all the way down to Cody Dock. Previously, to complete the walk you had to come off the river at Three Mills and go on an ugly detour round the Tesco's and the Blackwall Tunnel Approach. This ramp links up two bits so you can go more-or-less continuously down the river paths.
It was supposed to be open in September but... well... you know. And finally today it's open! Here are my exciting first pictures of it, looking robust against the wintry fog:
A fun bit of ironwork on top there. In the evening, the old streetlamp on the bridge lights up, and the new ironwork and the old streetlamp work well together.
It would have been nice if it had been open for all those autumnnal walks in the evening sun and the lengthening shadows. Instead, now you can walk all the way down to Cody Dock, except you won't find much going on down there in winter time. But hey ho, it's ready for 2017!
Oh and by the way here's Twelvetrees Ramp on the map