Note: I drafted this a while back but didn't get round to putting it on the blog. Now I have published code and a published paper about the GM-PHD filter, I thought these practical insights might be useful:
I've been tweaking the GM-PHD filter which I blogged about recently. (Gaussian mixture PHD is a GM implementation of the Probability Hypothesis Density filter, for tracking multiple objects in a set of noisy observations.)
I think there are some subtleties to it which are not immediately obvious from the research articles.
Also, I've published my open source GM-PHD Python code so if anyone finds it useful (or has patches to contribute) I'd be happy. There's also a short research paper about using the GM-PHD filter for multi-pitch tracking.
In that original blog post I said the results were noisier than I was hoping. I think there are a couple of reasons for this:
The filter benefits from a high-entropy representation and a good model of the target's movement. I started off with a simple 1D collection of particles with fixed velocity, and in my modelling I didn't tell the GM-PHD about the velocity - I just said there was position with some process noise and observation noise. Well, if I update this so the model knows about velocity too, and I specify the correct linear model (i.e. position is updated by adding the velocity term on to it) the results improve a little. I was hoping that I coud be a bit more generic than that. It may also be that my 1D example is too low-complexity, and a 2D example would give it more to focus on. Whatever happened to "keep it simple"?!
The filter really benefits from knowing where targets are likely to come from. In the original paper, the simulation examples are of objects coming from a fixed small number of "air bases" and so they can be tracked as soon as they "take off". If I'm looking to model audio, then I don't know what frequency things will start from, there's no strong model for that. So, I can give it a general "things can come from anywhere" prior, but that leads to the burn-in problem that I mentioned in my first blog post - targets will not accumulate much evidence for themselves, until many frames have elapsed. (It also adds algorithmic complexity, see below.)
Cold-start problem: the model doesn't include anything about pre-existing targets that might already be in the space, before the first frame (i.e. when the thing is "turned on"). It's possible to account for this slightly hackily by using a boosted "birth" distribution when processing the first frame, but this can't answer the question of how many objects to expect in the first frame - so you'd have to add a user parameter. It would be nice to come up with a neat closed-form way to decide what the steady-state expectation should be. (You can't just burn it in by running the thing with no observations for a while before you start - "no observations" is expressed as "empty set", which the model takes to mean definitely nothing there rather than ignorance. Ignorance would be expressed as an equal distribution over all possible observation sets, which is not something you can just drop in to the existing machinery.)
One mild flaw I spotted is in the pruning algorithm. It's needed because without it the number of Gaussians would diverge exponentially, so to keep it manageable you want to reduce this to some maximum limit at each step. However, the pruning algorithm given in the paper is a bit arbitrary, and in particular it fails to maintain the total sum of weights. It chops off low-weight components, and doesn't assign their lost weight to any of the survivors. This is important because the sum of weights for a GMPHD filter is essentially the estimated number of tracked objects. If you have a strong clean signal then it'll get over this flaw, but if not, you'll be leaking away density from your model at every step. So in my own code I renormalise the total mass after simplification - a simple change, hopefully a good one.
And a note about runtime: the size of the birth GMM strongly affects the running speed of the model. If you read through the description of how it works, this might not be obvious because the "pruning" is supposed to keep the number of components within a fixed limit so you might think it allows it to scale fine. However, the if birth GMM has many components, then they all must be cross-fertilised with each observation point at every step, and then pruned afterwards, so even if they don't persist they are still in action for the CPU-heavy part of the process. (The complexity has a kind of dependence on number-of-observations * number-of-birth-Gaussians.) If like me you have a model where you don't know where tracks will be born from, then you need many components to represent a flat distribution. (In my tests, using a single very wide Gaussian led to unpleasant bias towards the Gaussian's centre, no matter how wide I spread it.)