Photo (c) Jan Trutzschler von Falkenstein; CC-BY-NCPhoto (c) Rain RabbitPhoto (c) Samuel CravenPhoto (c) Gregorio Karman; CC-BY-NC


I am a research fellow, conducting research into automatic analysis of bird sounds using machine learning.
 —> Click here for more about my research.



Older things: Evolutionary sound · MCLD software · Sponsored haircut · StepMania · Oddmusic · Knots · Tetrastar fractal generator · Cipher cracking · CV · Emancipation fanzine


This is my approximation of the lovely dry-fried paneer served at Tayyabs, the famous Punjabi Indian place in East London. These amounts are for 1 as a main, or more as a starter. Takes about ten minutes:

First put the cubed paneer into a bowl, add the curry powder and cumin and toss to get an even coating.

Get a frying pan nice and hot, with about 1 tbsp of veg oil in it. Add the onion and chilli (and cumin seed if using). Note that you want the onion to be frying to be crispy at the end, so you want it finely sliced and separated (no big lumps), you want the oil hot, and you want the onion to have plenty of space in the pan. Fry it hot for about 4 minutes.

Add the paneer to the pan, and any spice left in the bowl. Shuffle it all around, it's time to get the paneer browning too. It'll take maybe another 4 minutes, not too long. Stir it now and again - it'll get nice and brown on the sides, no need to get a very even colour on all sides, but do turn it all around a couple of times.

Near the end, e.g. with 30 seconds to go, add the squeeze of lemon juice to the pan, and stir around. You might also like to sprinkle some garam masala into the pan too.

Serve the paneer with chive sprinkled over the top. It's good to have some bread to eat it with (e.g. naan or roti) and salad, or maybe with other indian things.

recipes · Permalink / Comment

I'm happy to say I'm now supervising two PhD students, Pablo and Veronica. Veronica is working on my project all about birdsong and machine learning - so I've got some notes here about recommended reading for someone starting on this topic. It's a niche topic but it's fascinating: sound in general is fascinating, and birdsong in particular is full of many mysteries, and it's amazing to explore these mysteries through the craft of trying to get machines to understand things on our behalf.

If you're thinking of starting in this area, you need to get acquainted with: (a) birds and bird sounds; (b) sound/audio and signal processing; (c) machine learning methods. You don't need to be expert in all of those - a little naivete can go a long way!

But here are some recommended reads. I don't want to give a big exhaustive bibliography of everything that's relevant. Instead, some choice reading that I have selected because I think it satisfies all of these criteria: each paper is readable, is relevant, and is representative of a different idea/method that I think you should know. They're all journal papers, which is good because they're quite short and focused, but if you want a more complete intro I'll mention some textbooks at the end.


science · Permalink / Comment

I'm having problems understanding people. More specifically, I'm having problems now that people are using emoji in their messages. Is it just me?

OK so here's what just happened. I saw this tweet which has some text and then 3 emoji. Looking at the emoji I think to myself,

"Right, so that's: a hand, a beige square (is the icon missing?), and an evil scary face. Hmm, what does he mean by that?"

I know that I can mouseover the images to see text telling me what the actual icons are meant to be. SO I mouseover the three images in turn and I get:

So it turns out I've completely misunderstood the emotion that was supposed to be on that face icon. Note that you probably see a different image than I do anyway, since different systems show different images for each glyph.

Clapping hands, OK fine, I can deal with that. Clapping hands and grinning face must mean that he's happy about the thing.

But "(white skin)"? WTF?

Is it just me? How do you manage to interpret these things?

technology · 1 comment

Lancashire hotpot is a classic dish where I come from. Lamb, onion, potatoes, slow-cooked.

There's a short version of this post: Felicity Cloake's "perfect Lancashire hotpot" article in the Guardian is correct. Read that article.

Really the main way you can mess up Lancashire hotpot is by trying to fancy it up. As Cloake says, don't pre-cook the potatoes or the onions, or the meat. With the meat, lamb neck is a good choice, easy to find in supermarkets and good for slow cooking. (I bet Cloake is right that mutton is more traditional and would suit it well, but I don't tend to find that in the shops.) Cut the meat into BIG pieces - not "bite-size" pieces as in many stews, and not the bite-size pieces you get in supermarket ready-diced meat. Bigger than that. At least an inch thick.

I'm pretty sure I remember there being carrots in the regular school hotpot, so I add carrot (in big chunks so it stands up to the long cooking). Floury potatoes (not waxy) is the right way to do it, definitely - and for the reasons mentioned by Cloake: "the potatoes that have come into contact with the gravy dissolve into a rich, meaty mash, while those on top go crisp and golden – for which one needs a floury variety such as, indeed, a maris piper." I've got a standard recipe book here which says to put some potatoes on the bottom as well as the top, and that seems a bit odd at first glance but it gives you a good ratio of crispy potato to melted potato...

In a sense this is basically just a stew/casserole and you can do what you like, so I can try not to be too dogmatic, but it's one of those minimalist recipes where if you mess about with it too much you have "just another stew" rather than this particular flavour. It's traditional to use kidneys as well as meat (my grandma did that) but we didn't have that at school and certainly when I'm cooking just for me I'm not going to bother. However, I'm shocked to see Jane Horrocks suggest putting black pudding in underneath the potatoes! It's also mentioned by commenters on the Guardian article, so I assume it must be a habit in some bits of Lancashire... but not my bit.

That aside, the recipe to look at is Felicity Cloake's "perfect Lancashire hotpot" article in the Guardian.

food · Permalink / Comment

I'm just back from a conference visit to the USA, to attend WASPAA and SANE. Lots of interesting presentations and discussions about intelligent audio analysis.

One of the interesting threads of discussion was about deep learning and modern neural networks, and how best to use them for audio signal processing. The deep learning revolution has already had a big impact on audio: famously, deep learning gets powerful results on speech recognition and is now used pervasively in industry for that task. It's also widely studied in image and video processing.

But that doesn't mean the job is done. Speech recognition is only one of many ways we get information out of audio, and other "tasks" are not direct analogies, they have different types of inputs and outputs. Secondly, there are many different neural net architectures, and still much lively research in which architectures are best for which purposes. Part of the reason that big companies get great results for speech recognition is that they have masses and masses of data. In cases where we have modest amounts of data, or data without labels, or data with fuzzy labels, getting the architecture just right is an important thing to focus on.

And audio signal processing insights are important for getting neural nets right for audio. This was one of the main themes of Emmanuel Vincent's WASPAA keynote, titled "Is audio signal processing still useful in the era of machine learning?" (Slides.) He mentioned, for example, that the intelligent application of data augmentation is a good way for audio insight to help train deep nets well. I agree, but in the long-term I think the more important point is that our expertise should be used to help get the architectures right. There's also the thorny question (and hot topic in deep learning) of how to make sense of what deep nets are actually doing: in a sense this is the flip-side of the architecturing issue, making sense of an architecture once it's been found to work!

It's common knowledge that convolutional nets (ConvNets) and recurrent neural nets (specifically LSTMs) are powerful architectures, and in principle LSTMs should be particularly appropriate for time-series data such as audio. Lots of recent work confirms this. At the SANE workshop Tuomas Virtanen presented results showing strong performance at sound event detection (recovering a "transcript" of the events in an audio scene), and Ron Weiss presented impressive deep learning that could operate directly from raw waveforms to perform beamforming and speech recognition from multi-microphone audio. Weiss was using an architecture combining convolutional units (to create filters) and LSTM units (to handle temporal dependences). Pablo Sprechmann discussed a few different architectures, including one "unfolded NMF"-type architecture. (The "deep unfolding" approach is certainly a fruitful idea for deep learning architectures. Introduced a couple of years ago by Hershey et al. [EDIT: It's been pointed out that the unfolding idea was first proposed by Gregor and Lecun in 2010, and unfolded NMF was described by Sprechmann et al. in 2012. The contribution of Hershey et al. comes from the curious step of untying the unfolded parameters, which turns a truncated iterative algorithm into something more like a deep network.])

I'd like to focus on a couple of talks at SANE that exemplified how domain issues inform architectural issues:

Happily, these discussions relate to some work I've been involved in this year. I spent some time visiting Rich Turner in Cambridge, and we had good debates and a small study about how to design a neural network well for audio. We have a submitted paper about denoising audio without access to clean data using a partitioned autoencoder which is the first fruit of that visit. The paper focuses on the "partitioning" issue but the design of the autoencoder itself has some similarities to what Paris Smaragdis was describing, and for similar reasons.

There's sometimes a temptation to feel despondent about deep learning: the sense of foreboding that a generic deep network will always beat your clever insightful method, just because of the phenomenal amounts of data and compute-hours that some large company can throw at it. All of the above discussions feed into a more optimistic interpretation, that domain expertise is crucial for getting machine-learning systems to learn the right thing - as long as you can learn to jettison some of your cherished habits (e.g. MFCCs!) at the right moment.

science · Permalink / Comment

I live in Tower Hamlets, the London borough with the largest proportion of Muslims in the UK. I see plenty of women every day who wear a veil of one kind or another. I don't have any kind of Muslim background so what could I do to start understanding why they wear what they do?

I went on a book hunt and luckily I found a book that gives a really clear background: "A Quiet Revolution" by Leila Ahmed. It's a book that describes some of the twentieth-century back-and-forth of different Islamic traditions, trends and politics, and how they relate to veils. The book has a great mix of historical overview and individual voices.

So, while of course there's lots I still don't understand, this book gives a really great grounding in what's going on with Muslim women, veils, and Western society. It's compulsory reading before launching into any naive feminist critique of Islam and/or veils. I'm sure feminists within Islam still have a lot to work out, and I don't know what the balance of "progress" is like there - please don't mistake me for thinking all is rosy. (There are some obvious headline issues, such as those countries which legally enforce veiling. I think to some Western eyes those headlines can obscure the fact that there are feminist conversations happening within Islam, and good luck to them.)

A couple of things that the book didn't cover, that I'd still like to know more about:

  1. The UK/London perspective. The book is written by an Egyptian-American so its Western chapters are all about things happening in North America. I'm sure there are connections but I'm sure there are big differences too. (I am told that Deobandi Islam is pertinent in the UK, not mentioned in the book.)
  2. The full-covering face veils, those ones that hide all of the face apart from the eyes. Ahmed's book focuses mainly on the hijab style promoted by Islamists such as the Muslim Brotherhood (see the photo for an example of the style), so we don't hear much about where those full face-coverings come from or what the women who wear them think.
books · Permalink / Comment

The Long Mynd is a range of hills in Shropshire. Very beautiful area this time of year. Lots of birds too. People often comment on the birds of prey: the buzzard, red kite and kestrel, soaring silently above and occasionally plummeting to pounce on something. Of course I'm more interested in the birds making the sounds all around.

I was most taken by the meadow pipits - as you walk around on the Mynd, they often leap surprised out of the heather and flitter away making alarmed "peeppeeppeep" sounds (or maybe more whistly than that, "pssppssppssp"). I saw a skylark too, ascending from the ground about 20 metres in front of me. It's great to witness it when they do that: an unhurried circling ascent, all the while burbling out their famously complex melodious song, like a little enraptured fax machine going to heaven.

While hanging around in the forest I noticed how many non-vocal bird sounds you can hear. The most common example is wing flutter sounds, I heard them from lots of different species, and the sound can often be very deliberate. The most surprising sound of all was when I was walking past a tree and heard a knocking sound and I thought, "Oh, is that a woodpecker starting up?" - but it wasn't. I could see the little bird on a branch a few metres away and it was a coal tit, doing a bit of a woodpecker impression. It would peck at the branch hard, about four times in a row, repeatedly, giving me the impression it might have been trying to do some DIY of some sort. It also tried it on a second branch.

Lots of gangs of ravens around too - their curious adaptable calls reminding me of the ones I saw recently at Seewiesen. I often heard (from a distance) the song of the nuthatch - that nice simple ascending note that I first encountered when camping in Dorset. Now and again a jay, lovely orangey and cyan colouring contrasting with its raspy magpie-ish yell. The jays seem to be shy around here, unlike the one that used to hang around in our garden in London.

Of course all the usual gang was there too: lots of robins singing, jackdaw, magpie, wren, house sparrow, blackbird, stock pigeon, one wood pigeon, the occasional chiff chaff. I think I heard a goldcrest at one point but I'm unsure. One willow warbler down by the reservoir.

birds · Permalink / Comment

Our journal paper Detection and Classification of Acoustic Scenes and Events is now out in IEEE Transactions on Multimedia! It evaluates many different methods for detecting/classifying in everyday audio recordings.

I'm highlighting this paper because it covers the whole process of the IEEE DCASE evaluation challenge that we ran a little while ago, with many international research teams submitting systems either for audio event detection or audio scene classification.

It was a big team effort, with various people putting many months of time in, from 2012 through to 2015 (even though it was essentially an unfunded initiative!). Specific thanks to Dimitrios and Emmanouil, who I know put lots of manual effort in, repeatedly, to get this right.

science · Permalink / Comment

Other recent posts: