Path: blob/master/deprecated/notebooks/vb_gmm_tfp.ipynb
1192 views
Variational Bayes for Gaussian Mixture Models using TFP
Code is written by Dave Moore, with some tweaks by Kevin Murphy.
We use a diagonal Gaussian approximation to the posterior (after transforming the variables) using SVI objective, optimized with full batch gradient descent. See here for code that implements full-batch VBEM using a conjugate prior.
Plotting code
Data
We use a datset of erruption times from the "Old Faithful" geyser in Yellowstone National Park.
Model
We put a Gaussian prior on each mean vector, an LKJ prior on each correlation matrix, and a half-normal prior on each scale vector. (This is not a conjugate prior.)
Fitting a point mass posterior (MAP estimate)
This marginalizes over the discrete latent indicators (as part of MixtureSameFamily logprob computation), but uses point estimates for model parameters, similar to standard EM. Thus there is no "Bayes Occam's razor" penalty factor when choosing too many mixture components.
Samples from the posterior predictive distribution should be constant across sampling runs, since we use a point estimate of the parameters.
Fitting a diagonal Gaussian posterior
Construct and fit a surrogate posterior using stochastic gradient VI. The surrogate is a diagonal Gaussian that is transformed into the support of the model's parameters using appropriate bijectors. (The transformed vector is then split into tensors for each of the models RVs, and these are pushed through constraining bijectors as needed.) For details, see https://www.tensorflow.org/probability/api_docs/python/tfp/experimental/vi/build_affine_surrogate_posterior
The event space for this distribution is derived from the pinned distribution. For details, see https://www.tensorflow.org/probability/api_docs/python/tfp/experimental/distributions/JointDistributionPinned