Last year I took part in the Dagstuhl seminar on Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR). Many fascinating discussions with phoneticians, roboticists, and animal behaviourists (ethologists).
One surprisingly difficult topic was to come up with a basic data model for describing multi-party interactions. It was so easy to pick a hole in any given model: for example, if we describe actors taking "turns" which have start-times and end-times, then are we really saying that the actor is not actively interacting when it's not their turn? Do conversation participants really flip discretely between an "on" mode and an "off" mode, or does that model ride roughshod over the phenomena we want to understand?
I was reminded of this modelling question when I read this very interesting new journal article by a Japanese research group: "HARKBird: Exploring Acoustic Interactions in Bird Communities Using a Microphone Array". They have developed this really neat setup with a portable microphone array attached to a laptop which does direction-estimation and decodes which birds are heard from which direction. In the paper they use this to help annotate the time-regions in which birds are active, a bit like on/off model I mentioned above. Here's a quick sketch:
From this type of data, Suzuki et al calculate a measure called the transfer entropy which quantifies the extent to which one individual's vocalisation patterns contain information that predicts the patterns of another. It gives them a hypothesis test for whether one particular individual affects another, in a network: who is listening to whom?
That's a very similar question to the question we were asking in our journal article last year, "Detailed temporal structure of communication networks in groups of songbirds". I talked about our model at the Dagstuhl event. Here I'll merely emphasise that our model doesn't use regions of time, but point-like events:
So our model works well for short calls, but is not appropriate for data that can't be well-described via single moments in time (e.g. extended sounds that aren't easily subdivided). The advantage of our model is that it's a generative probabilistic model: we're directly estimating the characteristics of a detailed temporal model of the communication. The transfer-entropy method, by contrast, doesn't model how the birds influence each other, just detects whether the influence has happened.
I'd love to get the best of both worlds. a generative and general model for extended sound events influencing one another. It's a tall order because for point-like events, we have point process theory; for extended events I don't think the theory is quite so well-developed. Markov models work OK but don't deal very neatly with multiple parallel streams. The search continues.