This weekend I was invited to take part in an event called Soundcamp. Let me quote their own description:
Soundcamp is a series of outdoor listening events on International Dawn Chorus Day, linked by Reveil: a 24 hour broadcast that tracks the sounds of daybreak, travelling West from microphone to microphone on sounds transmitted live by audio streamers around the globe.
Soundcamp / Reveil will be at Stave Hill Ecological Park in Rotherhithe from the 30th of April to the 1st of May 2016, and at soundcamps elsewhere in the UK and beyond.
So as an experience, it's two things: a worldwide live-streamed radio event that you can tune into online, and also if you're there in person it's a 24-hour outdoor event with camping, talks and workshops, with a focus on listening and on birdsong.
(Photos here are by @lemon_disco)
There was a great group of people involved. I was very happy to be on the bill with Geoff Sample the excellent bird recordist, and with Sarah Angliss the always-entertaining musician/roboticist/historian. We each spoke about bird sound from our own different angles and I think it was a really good mix of perspectives. There was also Jez Riley French the field recordist, who led a workshop on ultrasonic underwater sound, and Iain Bolton who took us on a bat walk: the immersive sound of multiple bat detectors clicking and squeaking away around the pond at dusk was much more of a sonic experience than I expected, quite memorable.
For myself - well, I talked about our work on automatic bird sound recognition, in particular our app Warblr: how it works, and how it has been used by people. But more than that, it was a great opportunity to think about how we listen to sound. Trying to get computers to make sense of sound is a good way to emphasise what's so strange and amazing about our own powers of listening.
It was also a perfect setting for the little collaboration Sarah Angliss and I put together last year. Sarah has built a robot carillon, a set of automated bells, and we worked together to transcribe birdsong automatically into musical scores for the bells. These bells, singing away in the corner of the park, with the warm spring weather and the real birdsong all around, were right at home.
At dawn on the Sunday we took a dawn chorus walk. It was an interesting thing to do, and the star of the show was undoubtedly when we reached the end of the walk, almost ready to go back, and a grasshopper warbler sang out loud and long and strange - an unfamiliar sound to me, and apparently the first time anyone had heard one around there! Is it an insect, a bird, a piece of machinery...?
The main Soundcamp organisers - Grant Smith, Maria Papadomanolaki, Dawn Scarfe - also brought their own great and really thoughtful approaches to listening too. Grant and his son led a workshop on making a soundscape streaming device, really quite simply with a Raspberry Pi and a couple of microphones (based on Primo EM172 capsules). I've been really impressed by the quality of the sound field they get from a pair of mics stuck in a section of poster tube.
Here's Maria mixing the radio stream, in the temporary on-site studio:
There are more photos here from Dawn Scarfe
Here's an audio phenomenon you should know about: Schroeder-phase complexes.
These are harmonic series which are designed so that their amplitude envelope is maximally flat. When you synthesise a harmonic series of partials, you know what frequencies you should use for the component sinewaves: F, 2F, 3F, 4F, etc. But what phase should you use?
Often we stick with a simple default such as every partial starts with zero phase. There's an issue with that, though, which can lead to issues in perceptual tests: the amplitude envelope, within one pitch period, is quite bumpy, because there are moments when the component phases all line up to produce strong amplitude. Sometimes this bumpiness leads to experimental confounds.
One thing you could do to work around this is use random phases, but adding this extra randomness into an experiment is usually not that desirable.
In 1970 Schroeder published a formula for choosing the phases so that the resulting waveform has a minimal crest factor, i.e. no big amplitude peaks. The formula is pretty simple but my blog doesn't render equations yet so see e.g. this paper.
Let me prove this to you directly: here I've synthesised the same harmonic sound with five different choices of phase. The top row, "sine-phase" and "cosine-phase" correspond to two versions of the default phase-aligned choice, and look how spiky they are:
In the middle is random phase, and at the bottom are two plots from Schroeder-phase. Please note that the y-axis has different scales in each plot - the waveforms each have the same energy, and the same Fourier-transform magnitudes, despite looking very different!
The reason that there are TWO Schroeder plots is because we have an option to flip the sign (time-reverse the waveform) while preserving the waveform characteristics. The shorthand label that people sometimes use is that one of these is "Schroeder-plus" and one is "Schroeder-minus".
BUT WAIT there's one weird thing I haven't shown you yet, and it pops out when you listen to the examples. These stimuli can be used to find frequency thresholds - at low frequency we can tell the difference, but at high frequency they sound identical. And the weirdest thing is when you listen to them at very low frequencies, they don't sound like static harmonic complexes at all (evenr though that's definitely how we generated them), they sound like otherworldly down- or up-chirps.
Listen to this audio file where I play a plus and a minus, at different frequencies. First at 300 Hz, then at 65 Hz, then 16 Hz, then 2 Hz. At first you'll hear two essentially identical tones, but then the differences become noticeable, and then overwhelming:
It's a nice demonstration of the fact that any periodic signal can be conceived as a sum of stationary sinusoids - as in Fourier analysis. Here we synthesised a chirpy nonstationary-sounding (but periodic) signal, starting from scratch from the sinusoids.
My implementation is here as SuperCollider code, inspired by this paper: Phase effects on the perceived elevation of complex tones.
So, our Warblr bird sound recognition app has been out for almost a month, and we've had many thousands of people using it and submitting bird audio recordings (thanks!). We've also had lots of great reviews in the consumer press. (Listen to this evocative piece on BBC Radio Scotland, fast-forward to 1hr 43.)
One thing which we knew was going to happen was that some people would demo it by playing back sound recordings into the mic, rather than recording actual birds. After all, sound recordings are easier to grab... What I didn't realise, from my own perspective, is that people would think this was a good way to test the app.
Playing back recordings is usually a really bad way to test the app, or any sound recognition app really, because recorded sounds differ in many many ways:
- Often people test it with low-quality audio recordings (encoded badly or squished as MP3s or Youtube videos). There are lots of recordings out there on the web which are noticeably distorted or over-filtered.
- Usually people use low-quality speakers to play back (laptops, phones) which miss out some of the audio content, or again distort it.
- Usually the audio environment around the playback is inappropriate (e.g. a chiff chaff in the kitchen!) which means the sound contains misleading information.
All of these things make the audio drastically different from a genuine direct recording, even though our human ears are clever enough to understand the correspondence. Yes, ideally a system would be as clever as our human ears, but that's for the future. (Note the difference from a product like Shazam, which recognises recordings but does not recognise the real live musician... interesting eh!)
Plus there's yet another aspect to consider: we make use of your location to help determine what kind of bird is likely. This is thanks to the BTO whose amazing crowdsourced bird data helps us know which birds to expect where and when. So, if you're playing a sound file that isn't native to where you are, our system is doubtful that the bird is there... and quite rightly doubtful, perhaps.
I can't emphasise enough that playing back recorded sounds is not the best way to test. We can't prevent people from doing this, of course! That's fine, but always bear in mind that you didn't test it in proper field conditions, only at your desk. You're not testing a bird recognition app if you're not testing it against real wild birds...
Someone on the Linux Audio Users list asked how they could analyse a load of FLAC files to work out if it was true for their music collection, that bass frequencies below about 150 Hz (say) tended to be centre-panned. Here's my answer.
First of all, coincidentally I know that Pedro Pestana published a nice analysis of exactly this phenomenon, at the AES 53rd conference recently. He actually looked at hundreds of number-one singles to determine the relationship between panning and frequency in the habits of producers/engineers for popular tracks. The paper isn't open access unfortunately but there you go.
So anyway here's a Python script I just wrote: script to analyse your audio files and plot the distribution of panning per frequency. And here's how it looks when I analyse the excellent Rumour Cubes album:
(Just to stress, this is a simple analysis. It simply looks at the spectral representation of the complete mix, it doesn't infer anything clever about the component parts of the mix.)
See any patterns? The pattern I was looking for is a bit subtle, but it's right down at the bottom below 100 Hz (i.e. 0.1 kHz on the scale): the bass tends to "pinch in" and not get panned around so much as the other stuff.
This analysis of Lotus Flower by Radiohead (by Daniel Jones) shows the effect more clearly.
This is what's generally observed, and widely known in mixing engineer "folklore": pan your bass to the centre, do what you like with the rest. Not everyone agrees on the reasons: some people say it's because the bass can cause the needle to skip out of vinyl records if it's off-centre, some people say it's because we can't really perceive the spatialisation very well at low frequencies, some people say it's just to maximise the energy in the mix. I have no comment on what the reasons might be, but it's certainly folk wisdom for various audio people, and empirically you can test it for yourself by analysing some of your music collection.
NOTE: Code and image updated 2014-02-08, thanks to Daniel Jones (see comments below) for spotting an issue.
Truax has a pretty nice way of talking about acoustic structure at different scales. As a composer he's been an important proponent of granular synthesis, and as a teacher his way of talking about sound meshes rather neatly with the granular approach.
One issue he brought out in his keynote is how, over the past 100 years, our ways of listening have changed, and our sophistication as listeners. He's not just talking about professional or arty listeners, but all of us. In the past, our "acoustic environment" was pretty much synonymous with our immediate environment more generally. This (Truax argues) is one of the reasons that people in the 1910s seemed to be fooled by the sound of an opera singer on a phonograph record, a sound which to us comes across as a feeble imitation. But recording technologies have allowed us to abstract the acoustic environment from our immediate environment: we now have a felicity with embedded acoustic environments that is so sophisticated as to be casual. We know how to relate to the person sitting next to us on the tube listening to headphones; we understand the voices in the radio, why they have different reverb from the room we're sitting in, and why they can't hear us; we understand what is being hinted at when the narrator in a radio play doesn't seem to be in the same room as the characters.
Later that night, at the concert, there was a great example of embedded acoustic environments. We were listening to a multi-channel electronic concert, in a huge ex-ship-building shed ("No. 3 slip") in a dockyard. This hangar allowed plenty of sound in from outdoors, and so as the music played, it was... ahem... "augmented" by various other sounds: the dockyard's big clock chiming the hours; the firework-like sounds of artillery fire in a naval training ground; and also a heavily-echoed "Call Me" by Blondie!
I don't believe any of this was deliberate ;) but it's a great example of an embedded acoustic environment - and furthermore, the challenge that it presents to electronic composers. Composers need to be aware that the environment they're constructing will be usually played back over some speakers which don't form the entirety of the acoustic environment, but a sub-system of it, for the listeners. (Is this challenge equivalent to a demand to always be site-specific? Not quite, but related.) Some of the composers last night I think did not rise to this challenge, and it showed. But some of them did. Barry Truax was premiering a new piece called "Earth and Steel", written specifically for this place, and it worked great, it was very affecting.