Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
AllenDowney
GitHub Repository: AllenDowney/bayesian-analysis-recipes
Path: blob/master/incubator/microbiome-case-study.ipynb
411 views
Kernel: bayesian
import pandas as pd import pymc3 as pm import numpy as np import matplotlib.pyplot as plt import war %load_ext autoreload %autoreload 2 %matplotlib inline %config InlineBackend.figure_format = 'retina'

Analysis

Microbiome composition can be shifted by diet. The experiments that have backed this conclusion involve randomly assigning mice into control and case groups, feeding them different diets, and then quantifying the composition of the microbiota in the gut. I want to know whether we are able to use a Bayesian Dirichlet-Multinomial model to quantify the uncertainty surrounding the measured proportions.

microbiome = pd.read_csv("../datasets/MicrobiomeWithMetadata.csv") microbiome.head()

The metadata file that's associated with this CSV file has to be re-coded from a CSV file to a YAML file.

from collections import defaultdict md = defaultdict(dict) # "md" stands for "metadata dictionary" sex = ["Male", "Female"] donor = ["HMouseLFPP", "CONVR", "Human", "Fresh", "Frozen", "HMouseWestern", "CONVD"] diet = ["LFPP", "Western", "CARBR", "FATR", "Suckling", "Human"] source = [ "Cecum1", "Cecum2", "Colon1", "Colon2", "Feces", "SI1", "SI13", "SI15", "SI2", "SI5", "SI9", "Stomach", "Cecum", ] collection_met = ["Contents", "Scraping"] for i, s in enumerate(sex): md["sex"][i] = s for i, d in enumerate(donor): md["donor"][i] = d for i, d in enumerate(diet): md["diet"][i] = d for i, s in enumerate(source): md["source"][i] = s for i, c in enumerate(collection_met): md["collection_met"][i] = c
import yaml print(yaml.dump(md))
with open("datasets/MicrobiomeMetadataDictionary.yml", "w+") as f: f.write(yaml.dump(md))
with open("datasets/MicrobiomeMetadataDictionary.yml", "r+") as f: metadata = yaml.load(f) metadata
set(microbiome["Diet"].values)
set(microbiome["Source"].values)
otu_cols = [c for c in microbiome.columns if "OTU" in c]
with pm.Model() as dirichlet_model: mu = pm.HalfNormal("mu", sd=100 ** 2) n_seq_reads = pm.Poisson("n_seq_reads", mu=mu, observed=healthy_reads.sum(axis=1)) proportions = pm.Dirichlet("proportions", a=np.ones(3), shape=(3,)) for i in range(healthy_reads.shape[0]): draws = pm.Multinomial( f"draws_{i}", n=healthy_reads[i].sum(), p=proportions, observed=healthy_reads[i, :], ) dirichlet_trace = pm.sample(draws=2000) pm.traceplot(dirichlet_trace)