Path: blob/master/deprecated/notebooks/hmm_poisson_changepoint_jax.ipynb
1192 views
HMM with Poisson observations for detecting changepoints in the rate of a signal
This notebook is based on the Multiple Changepoint Detection and Bayesian Model Selection Notebook of TensorFlow
Data
The synthetic data corresponds to a single time series of counts, where the rate of the underlying generative process changes at certain points in time.
Model with fixed
To model the changing Poisson rate, we use an HMM. We initially assume the number of states is known to be . Later we will try comparing HMMs with different .
We fix the initial state distribution to be uniform, and fix the transition matrix to be the following, where we set :
Now we create an HMM where the observation distribution is a Poisson with learnable parameters. We specify the parameters in log space and initialize them to random values around the log of the overall mean count (to set the scal
Model fitting using Gradient Descent
We compute a MAP estimate of the Poisson rates using batch gradient descent, using the Adam optimizer applied to the log likelihood (from the HMM) plus the log prior for .
We see that the method learned a good approximation to the true (generating) parameters, up to a permutation of the states (since the labels are unidentifiable). However, results can vary with different random seeds. We may find that the rates are the same for some states, which means those states are being treated as identical, and are therefore redundant.
Plotting the posterior over states
Model with unknown
In general we don't know the true number of states. One way to select the 'best' model is to compute the one with the maximum marginal likelihood. Rather than summing over both discrete latent states and integrating over the unknown parameters , we just maximize over the parameters (empirical Bayes approximation).
We can do this by fitting a bank of separate HMMs in parallel, one for each value of . We need to make them all the same size so we can batch them efficiently. To do this, we pad the transition matrices (and other paraemeter vectors) so they all have the same shape, and then use masking.