Path: blob/master/docs/notebooks/dirichlet-multinomial-bayesian-proportions.ipynb
419 views
Introduction
Let's say there are three bacteria species that characterize the gut, and we hypothesize that they are ever so shifted off from one another, but we don't know how (i.e. ignore the data-generating distribution below). Can we figure out the proportion parameters and their uncertainty?
Generate Synthetic Data
In the synthetic dataset generated below, we pretend that every patient is one sample, and we are recording the number of sequencing reads corresponding to some OTUs (bacteria). Each row is one sample (patient), and each column is one OTU (sample).
Proportions
Firstly, let's generate the ground truth proportions that we will infer later on.
Data
Now, given the proportions, let's generate data. Here, we are assuming that there are 10 patients per cohort (10 sick patients and 10 healthy patients), and that the number of counts in total is 50.
Model Construction
Here's an implementation of the model - Dirichlet prior with Multinomial likelihood.
There are 3 classes of bacteria, so the Dirichlet distribution serves as the prior probability mass over each of the classes in the multinomial distribution.
The multinomial distribution serves as the likelihood function.
Sampling
Results
They match up with the original synthetic percentages!