Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
allendowney
GitHub Repository: allendowney/thinkbayes2
Path: blob/master/examples/hospital4.ipynb
1901 views
Kernel: Python 3 (ipykernel)

Grid algorithm for a logitnormal-binomial hierarchical model

Bayesian Inference with PyMC

Copyright 2021 Allen B. Downey

License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

# If we're running on Colab, install libraries import sys IN_COLAB = 'google.colab' in sys.modules if IN_COLAB: !pip install pymc3 !pip install arviz !pip install empiricaldist
# PyMC generates a FutureWarning we don't need to deal with yet import warnings warnings.filterwarnings("ignore", category=FutureWarning)
import matplotlib.pyplot as plt def legend(**options): """Make a legend only if there are labels.""" handles, labels = plt.gca().get_legend_handles_labels() if len(labels): plt.legend(**options)
def decorate(**options): plt.gca().set(**options) legend() plt.tight_layout()
from empiricaldist import Cdf def compare_cdf(pmf, sample): pmf.make_cdf().plot(label='grid') Cdf.from_seq(sample).plot(label='mcmc') print(pmf.mean(), sample.mean()) decorate()
from empiricaldist import Pmf def make_pmf(ps, qs, name): pmf = Pmf(ps, qs) pmf.normalize() pmf.index.name = name return pmf

Heart Attack Data

This example is based on Chapter 10 of Probability and Bayesian Modeling; it uses data on death rates due to heart attack for patients treated at various hospitals in New York City.

We can use Pandas to read the data into a DataFrame.

import os filename = 'DeathHeartAttackManhattan.csv' if not os.path.exists(filename): !wget https://github.com/AllenDowney/BayesianInferencePyMC/raw/main/DeathHeartAttackManhattan.csv
import pandas as pd df = pd.read_csv(filename) df

The columns we need are Cases, which is the number of patients treated at each hospital, and Deaths, which is the number of those patients who died.

data_ns = df['Cases'].values data_ks = df['Deaths'].values

Hospital Data with PyMC

Here's a hierarchical model that estimates the death rate for each hospital, and simultaneously estimates the distribution of rates across hospitals.

import pymc3 as pm import theano.tensor as tt def make_model(): with pm.Model() as model: mu = pm.Normal('mu', 0, 2) sigma = pm.HalfNormal('sigma', sigma=1) xs = pm.LogitNormal('xs', mu=mu, sigma=sigma, shape=len(data_ns)) ks = pm.Binomial('ks', n=data_ns, p=xs, observed=data_ks) return model
%time model = make_model() pm.model_to_graphviz(model)
CPU times: user 881 ms, sys: 71.1 ms, total: 952 ms Wall time: 2.24 s
Image in a Jupyter notebook
with model: pred = pm.sample_prior_predictive(1000) %time trace = pm.sample(500, target_accept=0.97)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [xs, sigma, mu]
Sampling 4 chains for 1_000 tune and 500 draw iterations (4_000 + 2_000 draws total) took 5 seconds. There was 1 divergence after tuning. Increase `target_accept` or reparameterize. There was 1 divergence after tuning. Increase `target_accept` or reparameterize. There were 2 divergences after tuning. Increase `target_accept` or reparameterize. The acceptance probability does not match the target. It is 0.9280282858677881, but should be close to 0.97. Try to increase the number of tuning steps. The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling. The estimated number of effective samples is smaller than 200 for some parameters.
CPU times: user 5.2 s, sys: 185 ms, total: 5.38 s Wall time: 9.73 s

Here are the posterior distributions of the hyperparameters

import arviz as az with model: az.plot_posterior(trace, var_names=['mu', 'sigma'])
Image in a Jupyter notebook

And we can extract the posterior distributions of the xs.

trace_xs = trace['xs'].transpose() trace_xs.shape
(13, 2000)

As an example, here's the posterior distribution of x for the first hospital.

with model: az.plot_posterior(trace_xs[0])
Image in a Jupyter notebook

Just one update

i = 3 data_n = data_ns[i] data_k = data_ks[i]
sample = pm.HalfNormal.dist().random(size=1000) Cdf.from_seq(sample).plot()
<AxesSubplot:>
Image in a Jupyter notebook
def make_model1(): with pm.Model() as model1: mu = pm.Normal('mu', 0, 2) sigma = pm.HalfNormal('sigma', sigma=1) x = pm.LogitNormal('x', mu=mu, sigma=sigma) k = pm.Binomial('k', n=data_n, p=x, observed=data_k) return model1
model1 = make_model1() pm.model_to_graphviz(model1)
Image in a Jupyter notebook
with model1: pred1 = pm.sample_prior_predictive(1000) trace1 = pm.sample(500, target_accept=0.97)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [x, sigma, mu]
Sampling 4 chains for 1_000 tune and 500 draw iterations (4_000 + 2_000 draws total) took 3 seconds. There were 9 divergences after tuning. Increase `target_accept` or reparameterize. There were 7 divergences after tuning. Increase `target_accept` or reparameterize. There were 153 divergences after tuning. Increase `target_accept` or reparameterize. The acceptance probability does not match the target. It is 0.5361146921219951, but should be close to 0.97. Try to increase the number of tuning steps. There were 19 divergences after tuning. Increase `target_accept` or reparameterize. The acceptance probability does not match the target. It is 0.9360391707579205, but should be close to 0.97. Try to increase the number of tuning steps. The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling. The estimated number of effective samples is smaller than 200 for some parameters.
Cdf.from_seq(pred1['mu']).plot(label='prior', color='gray') Cdf.from_seq(trace1['mu']).plot(label='posterior') decorate(title='Distribution of mu')
Image in a Jupyter notebook
Cdf.from_seq(pred1['sigma']).plot(label='prior', color='gray') Cdf.from_seq(trace1['sigma']).plot(label='posterior') decorate(title='Distribution of sigma')
Image in a Jupyter notebook
Cdf.from_seq(pred1['x']).plot(label='prior', color='gray') Cdf.from_seq(trace1['x']).plot(label='posterior') decorate(title='Distribution of x')
Image in a Jupyter notebook

The grid priors

import numpy as np from scipy.stats import norm mus = np.linspace(-6, 6, 201) ps = norm.pdf(mus, 0, 2) prior_mu = make_pmf(ps, mus, 'mu') prior_mu.plot() decorate(title='Prior distribution of mu')
Image in a Jupyter notebook
from scipy.stats import logistic sigmas = np.linspace(0.03, 3.6, 90) ps = norm.pdf(sigmas, 0, 1) prior_sigma = make_pmf(ps, sigmas, 'sigma') prior_sigma.plot() decorate(title='Prior distribution of sigma')
Image in a Jupyter notebook
compare_cdf(prior_mu, pred1['mu']) decorate(title='Prior distribution of mu')
-1.1102230246251565e-16 0.058920615604417595
Image in a Jupyter notebook
compare_cdf(prior_sigma, pred1['sigma']) decorate(title='Prior distribution of sigma')
0.8033718951689776 0.7863836350602794
Image in a Jupyter notebook

The joint distribution of hyperparameters

# TODO: Change these variable names def make_hyper(prior_alpha, prior_beta): PA, PB = np.meshgrid(prior_alpha.ps, prior_beta.ps, indexing='ij') hyper = PA * PB return hyper
prior_hyper = make_hyper(prior_mu, prior_sigma) prior_hyper.shape
(201, 90)
import pandas as pd from utils import plot_contour plot_contour(pd.DataFrame(prior_hyper, index=mus, columns=sigmas)) decorate(title="Joint prior of mu and sigma")
Image in a Jupyter notebook

Joint prior of hyperparameters, and x

xs = np.linspace(0.01, 0.99, 99) M, S, X = np.meshgrid(mus, sigmas, xs, indexing='ij')
from scipy.special import logit LO = logit(X) LO.sum()
2.1365664792938333e-10
from scipy.stats import beta as betadist %time normpdf = norm.pdf(LO, M, S) normpdf.sum()
CPU times: user 53.2 ms, sys: 8.55 ms, total: 61.8 ms Wall time: 60.2 ms
143418.58755534427

We can speed this up by computing skipping the terms that don't depend on x

# TODO
totals = normpdf.sum(axis=2) totals.sum()
143418.58755534427
shape = totals.shape + (1,) totals = totals.reshape(shape) out = np.zeros_like(normpdf) normpdf = np.divide(normpdf, totals, out=out, where=(totals!=0)) normpdf.sum()
18079.999999999996
def make_prior(hyper): # reshape hyper so we can multiply along axis 0 shape = hyper.shape + (1,) prior = normpdf * hyper.reshape(shape) return prior
%time prior = make_prior(prior_hyper) prior.sum()
CPU times: user 4.25 ms, sys: 123 µs, total: 4.37 ms Wall time: 3.05 ms
0.9999482624648524

The following function computes the marginal distributions.

def marginal(joint, axis): axes = [i for i in range(3) if i != axis] return joint.sum(axis=tuple(axes))

And let's confirm that the marginal distributions are what they are supposed to be.

prior_mu.plot() marginal_mu = Pmf(marginal(prior, 0), mus) marginal_mu.plot() decorate(title='Checking the marginal distribution of mu')
Image in a Jupyter notebook
prior_sigma.plot() marginal_sigma = Pmf(marginal(prior, 1), sigmas) marginal_sigma.plot() decorate(title='Checking the marginal distribution of sigma')
Image in a Jupyter notebook
prior_x = Pmf(marginal(prior, 2), xs) prior_x.plot() decorate(title='Prior distribution of x', ylim=[0, prior_x.max()*1.05])
Image in a Jupyter notebook
compare_cdf(prior_x, pred1['x']) decorate(title='Checking the marginal distribution of x')
0.49997413123242573 0.5101368784087403
Image in a Jupyter notebook
def get_hyper(joint): return joint.sum(axis=2)
hyper = get_hyper(prior)
plot_contour(pd.DataFrame(hyper, index=mus, columns=sigmas)) decorate(title="Joint prior of mu and sigma")
Image in a Jupyter notebook

The Update

from scipy.stats import binom like_x = binom.pmf(data_k, data_n, xs) like_x.shape
(99,)
plt.plot(xs, like_x) decorate(title='Likelihood of the data')
Image in a Jupyter notebook
def update(prior, data): n, k = data like_x = binom.pmf(k, n, xs) posterior = prior * like_x posterior /= posterior.sum() return posterior
data = data_n, data_k %time posterior = update(prior, data)
CPU times: user 6.17 ms, sys: 1.26 ms, total: 7.42 ms Wall time: 5.62 ms
marginal_mu = Pmf(marginal(posterior, 0), mus) compare_cdf(marginal_mu, trace1['mu'])
-2.2839085822689538 -2.088863822436312
Image in a Jupyter notebook
marginal_sigma = Pmf(marginal(posterior, 1), sigmas) compare_cdf(marginal_sigma, trace1['sigma'])
0.7427800571017901 0.7335328690237511
Image in a Jupyter notebook
marginal_x = Pmf(marginal(posterior, 2), xs) compare_cdf(marginal_x, trace1['x']) marginal_x.mean(), trace1['x'].mean()
0.09196221979589297 0.08762936641306768
(0.09196221979589297, 0.08762936641306768)
Image in a Jupyter notebook
posterior_hyper = get_hyper(posterior)
plot_contour(pd.DataFrame(posterior_hyper, index=mus, columns=sigmas)) decorate(title="Joint posterior of mu and sigma")
Image in a Jupyter notebook
like_hyper = posterior_hyper / prior_hyper
plot_contour(pd.DataFrame(like_hyper, index=mus, columns=sigmas)) decorate(title="Likelihood of mu and sigma")
Image in a Jupyter notebook

Multiple updates

prior = make_prior(prior_hyper) prior.shape
(201, 90, 99)
def multiple_updates(prior, ns, ks, xs): for data in zip(ns, ks): print(data) posterior = update(prior, data) hyper = get_hyper(posterior) prior = make_prior(hyper) return posterior
%time posterior = multiple_updates(prior, data_ns, data_ks, xs)
(129, 4) (35, 1) (228, 18) (84, 7) (291, 24) (270, 16) (46, 6) (293, 19) (241, 15) (105, 13) (353, 25) (250, 11) (41, 4) CPU times: user 140 ms, sys: 0 ns, total: 140 ms Wall time: 136 ms
marginal_mu = Pmf(marginal(posterior, 0), mus) compare_cdf(marginal_mu, trace['mu'])
-2.6467518480359096 -2.6015721172366684
Image in a Jupyter notebook
marginal_sigma = Pmf(marginal(posterior, 1), sigmas) compare_cdf(marginal_sigma, trace['sigma'])
0.1915112553021888 0.18254878851591824
Image in a Jupyter notebook
marginal_x = Pmf(marginal(posterior, 2), prior_x.qs) compare_cdf(marginal_x, trace_xs[-1])
0.07337763229857364 0.07294535138274955
Image in a Jupyter notebook
posterior_hyper = get_hyper(posterior)
plot_contour(pd.DataFrame(posterior_hyper, index=mus, columns=sigmas)) decorate(title="Joint posterior of mu and sigma")
Image in a Jupyter notebook
like_hyper = posterior_hyper / prior_hyper
plot_contour(pd.DataFrame(like_hyper, index=mus, columns=sigmas)) decorate(title="Likelihood of mu and sigma")
Image in a Jupyter notebook

One at a time

def compute_likes_hyper(ns, ks): shape = ns.shape + mus.shape + sigmas.shape likes_hyper = np.empty(shape) for i, data in enumerate(zip(ns, ks)): print(data) n, k = data like_x = binom.pmf(k, n, xs) posterior = normpdf * like_x likes_hyper[i] = posterior.sum(axis=2) print(likes_hyper[i].sum()) return likes_hyper
%time likes_hyper = compute_likes_hyper(data_ns, data_ks)
(129, 4) 274.6636874928713 (35, 1) 947.3342168409214 (228, 18) 94.69814128862927 (84, 7) 248.83063630089848 (291, 24) 72.82847873198654 (270, 16) 91.66869577666596 (46, 6) 375.4942185566961 (293, 19) 80.87855004218346 (241, 15) 100.20080616305229 (105, 13) 169.5205275044766 (353, 25) 64.42584485648294 (250, 11) 115.30877435625138 (41, 4) 470.80922741036693 CPU times: user 62.5 ms, sys: 2.83 ms, total: 65.4 ms Wall time: 60.8 ms
likes_hyper.sum()
3106.661805321483
like_hyper_all = likes_hyper.prod(axis=0) like_hyper_all.sum()
3.386435404797562e-14
plot_contour(pd.DataFrame(like_hyper_all, index=mus, columns=sigmas)) decorate(title="Likelihood of mu and sigma")
Image in a Jupyter notebook
posterior_hyper_all = prior_hyper * like_hyper_all posterior_hyper_all /= posterior_hyper_all.sum() np.allclose(posterior_hyper_all, posterior_hyper)
True
marginal_mu2 = Pmf(posterior_hyper_all.sum(axis=1), mus) marginal_mu2.make_cdf().plot() marginal_mu.make_cdf().plot() np.allclose(marginal_mu, marginal_mu2)
True
Image in a Jupyter notebook
marginal_sigma2 = Pmf(posterior_hyper_all.sum(axis=0), sigmas) marginal_sigma2.make_cdf().plot() marginal_sigma.make_cdf().plot() np.allclose(marginal_sigma, marginal_sigma2)
True
Image in a Jupyter notebook
plot_contour(pd.DataFrame(posterior_hyper_all, index=mus, columns=sigmas)) decorate(title="Joint posterior of mu and sigma")
Image in a Jupyter notebook
i = 3 data = data_ns[i], data_ks[i] data
(84, 7)
out = np.zeros_like(prior_hyper) hyper_i = np.divide(prior_hyper * like_hyper_all, likes_hyper[i], out=out, where=(like_hyper_all!=0)) hyper_i.sum()
4.3498768007097106e-17
prior_i = make_prior(hyper_i)
posterior_i = update(prior_i, data)
Pmf(marginal(posterior_i, 0), mus).make_cdf().plot() marginal_mu.make_cdf().plot()
<AxesSubplot:>
Image in a Jupyter notebook
Pmf(marginal(posterior_i, 1), sigmas).make_cdf().plot() marginal_sigma.make_cdf().plot()
<AxesSubplot:>
Image in a Jupyter notebook
marginal_mu = Pmf(marginal(posterior_i, 0), mus) marginal_sigma = Pmf(marginal(posterior_i, 1), sigmas) marginal_x = Pmf(marginal(posterior_i, 2), xs)
compare_cdf(marginal_mu, trace['mu'])
-2.6467518480359087 -2.6015721172366684
Image in a Jupyter notebook
compare_cdf(marginal_sigma, trace['sigma'])
0.19151125530218874 0.18254878851591824
Image in a Jupyter notebook
compare_cdf(marginal_x, trace_xs[i])
0.07252787840957128 0.07204467292400059
Image in a Jupyter notebook
def compute_all_marginals(ns, ks): prior = prior_hyper * like_hyper_all for i, data in enumerate(zip(ns, ks)): n, k = data out = np.zeros_like(prior) hyper_i = np.divide(prior, likes_hyper[i], out=out, where=(prior!=0)) prior_i = make_prior(hyper_i) posterior_i = update(prior_i, data) marginal_x = Pmf(marginal(posterior_i, 2), xs) marginal_x.make_cdf().plot() print(i, n, k/n, marginal_x.mean())
for hyper_i in likes_hyper: print(i, (hyper_i==0).sum())
3 10 3 10 3 86 3 10 3 122 3 121 3 10 3 131 3 97 3 10 3 154 3 114 3 10
%time compute_all_marginals(data_ns, data_ks)
0 129 0.031007751937984496 0.06137958653648906 1 35 0.02857142857142857 0.06664479974857614 2 228 0.07894736842105263 0.07274237049843928 3 84 0.08333333333333333 0.07252787840957128 4 291 0.08247422680412371 0.07436258897925768 5 270 0.05925925925925926 0.06616799860643609 6 46 0.13043478260869565 0.07778708222589173 7 293 0.06484641638225255 0.06797709854830382 8 241 0.06224066390041494 0.06733148226794133 9 105 0.12380952380952381 0.08185537453928823 10 353 0.0708215297450425 0.07011480915344975 11 250 0.044 0.06149989077759829 12 41 0.0975609756097561 0.07337763229857365 CPU times: user 246 ms, sys: 168 ms, total: 413 ms Wall time: 212 ms
Image in a Jupyter notebook