Learn about

**The Molecules of HIV**

Maths can be magic sometimes. Here's a probability thing - it's very very niche, but it works out nicely.

I needed to sample K iid exponential variables given that I knew their sum was S. The exponential distribution has one parameter for the "rate" (usually written as the greek character lambda, but let me write it as L). We don't necessarily know the value of L but we do know S.

Let's start simply. Instead of saying we have K variables we'll start with just two, and we'll call them x and y. So we know S, and we know that x+y=S.

For comparison, let me see what happens when we condition on the maximum rather than the sum. So we know the maximum M, and we know that x<M and y<M. It's quite a similar problem, and in that case we find that x and y remain independent, and identically distributed: P(x|M) and P(y|M) are just the truncated exponential distribution. This distribution still involves L. The shape of the distribution still depends on this unknown rate parameter.

So back to the real problem. It's the sum x+y=S that we know. So let's write the joint distribution:

P(x,y|S) Î± L exp(-Lx) L exp(-Ly) I(x+y=S)

(where "I()" is the indicator function, and "Î±" means "is proportional to" - my blog doesn't do proper maths typesetting.)

and we can substitute y=S-x:

P(x,y|S) Î± L exp(-Lx) L exp(-LS + Lx) I(x+y=S)

P(x,y|S) Î± L exp(-Lx) L exp(-LS) exp(Lx) I(x+y=S)

P(x,y|S) Î± L^2 exp(-LS) I(x+y=S)

So the x and y vanish to leave constant terms (except for the indicator that limits us to the 2-simplex). The distribution is simply uniform on the 2-simplex! It doesn't even depend on L - **the lambda vanishes**. Magic.

In other words, to sample two iid exponentials conditioned on their sum S, you can do it by taking a *single* *uniform* sample in the range [0,S]. Much easier than we might have thought.

This generalises to more than two variables. Sampling K exponentials conditioned on their sum S is equivalent to sampling uniformly from the unit K-simplex and multiplying the result by S.

This is covered more formally in this free book Non-Uniform Random Variate Generation, which I found via someone's blog article Sampling from a simplex (where they're coming at the problem from the other way round).