InterSpeech 2016 was a very interesting conference. I have been to InterSpeech before, yes - but I'm not a speech-recognition person so it's not my "home" conference. I was there specifically for the birds/animals special session (organised by Naomi Harte and Peter Jancovic), but it was also a great opportunity to check in on what's going on in speech technology research.
Here's a note of some of the interesting papers I saw. I'll start with some of the birds/animals papers:
That's not all the bird/animal papers, sorry, just the ones I have comments about.
And now a sampling of the other papers that caught my interest:
One thing you won't realise from my own notes is that InterSpeech was heavily dominated by deep learning. Convolutional neural nets (ConvNets), recurrent neural nets (RNNs), they were everywhere. Lots of discussion about connectionist temporal classification (CTC) - some people say it's the best, some people say it requires too much data to train properly, some people say they have other tricks so they can get away without it. It will be interesting to see how that discussion evolves. However, many of the other deep-learning based papers were much of a muchness: lots of people use a ConvNet or an RNN and, as we all know, in many cases they can get good results. They apply these to many tasks in speech technology. However, in many cases there was application without a whole lot of insight. That's the way the state of the art is at the moment, I guess. Therefore, many of my most interesting moments at InterSpeech were deep-learning-less :) see above.
(Also, I had to miss the final day, to catch my return flight. Wish I'd been able to go to the VAD and Audio Events session, for example.)
Another aspect of speech technology is the emphasis on public data challenges - there are lots of them! Speech recognition, speaker recognition, language recognition, distant speech recognition, zero-resource speech recognition, de-reverberation... Some of these have been running for years and the dedication of the organisers is worth praising. Useful to check in on how these things are organised, as we develop similar initiatives in general and natural sound scene analysis.