GitHub Repository: allendowney/thinkbayes2
Path: blob/master/soln/combinatorics.ipynb
¹⁹⁰¹ views

Kernel: Python 3

Think Bayes

Second Edition

License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

In [1]:

# If we're running on Colab, install empiricaldist
# https://pypi.org/project/empiricaldist/

import sys
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    !pip install empiricaldist

In [2]:

# Get utils.py and create directories

import os

if not os.path.exists('utils.py'):
    !wget https://github.com/AllenDowney/ThinkBayes2/raw/master/code/soln/utils.py
        
if not os.path.exists('figs'):
    !mkdir figs

In [3]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from empiricaldist import Pmf, Cdf
from utils import decorate, savefig

Introduction

Three dimensions!

The Grizzly Bear Problem

In 1996 and 1997 Mowat and Strobeck deployed bear traps in locations in British Columbia and Alberta, in an effort to estimate the population of grizzly bears. They describe the experiment in "Estimating Population Size of Grizzly Bears Using Hair Capture, DNA Profiling, and Mark-Recapture Analysis"

The "trap" consists of a lure and several strands of barbed wire intended to capture samples of hair from bears that visit the lure. Using the hair samples, the researchers use DNA analysis to identify individual bears.

During the first session, on June 29, 1996, the researchers deployed traps at 76 sites. Returning 10 days later, they obtained 1043 hair samples and identified 23 different bears. During a second 10-day session they obtained 1191 samples from 19 different bears, where 4 of the 19 were from bears they had identified in the first batch.

To estimate the population of bears from this data, we need a model for the probability that each bear will be observed during each session. As a starting place, we'll make the simplest assumption, that every bear in the population has the same (unknown) probability of being sampled during each round.

With these assumptions we can compute the probability of the data for a range of possible populations.

As an example, let's suppose that the actual population of bears is 200.

After the first session, 23 of the 200 bears have been identified. During the second session, if we choose 19 bears at random, what is the probability that 4 of them were previously identified?

I'll define

N: actual (unknown) population size, 200.
K: number of bears identified in the first session, 23.
n: number of bears observed in the second session, 19 in the example.
k: the number of bears in the second session that had previously been identified, 4.

For given values of N, K, and n, the distribution of k is described by the hypergeometric distribution:

$PMF(k) = {K \choose k}{N-K \choose n-k}/{N \choose n}$

To understand why, consider:

The denominator, ${ N \choose n}$ , is the number of subsets of $n$ we could choose from a population of $N$ bears.
The numerator is the number of subsets that contain $k$ bears from the previously identified $K$ and $n-k$ from the previously unseen $N-K$ .

SciPy provides hypergeom, which we can use to compute this PMF.

In [138]:

from scipy.stats import hypergeom

N = 100
K = 23
n = 19

ks = np.arange(12)
ps = hypergeom(N, K, n).pmf(ks)

plt.bar(ks, ps)

decorate(xlabel='Number of bears observed twice',
         ylabel='PMF',
         title='Hypergeometric distribution of k (known population 200)')

Out[138]:

So that's the distribution of k given N, K, and n. Now let's go the other way: given K, n, and k, how can we estimate the total population, N?

As a starting place, let's suppose that, prior to this study, an expert in this domain would have estimated that the population is between 50 and 500, and equally likely to be any value in that range.

In [5]:

Ns = np.arange(50, 501)
prior_N = Pmf(1, Ns)
prior_N.index.name = 'N'

So that's our prior.

To compute the likelihood of the data, we can use hypergeom with constants K and n, and a range of values of N.

In [6]:

K = 23
n = 19
k = 4

likelihood = hypergeom(Ns, K, n).pmf(k)

We can compute the posterior in the usual way.

In [7]:

posterior_N = prior_N * likelihood
posterior_N.normalize()

Out[7]:

34.97606148975133

And here's what it looks like.

In [8]:

posterior_N.plot()

decorate(xlabel='Population of bears (N)',
         ylabel='PDF',
         title='Posterior distribution of N')

Out[8]:

The most likely value is 109.

In [9]:

posterior_N.max_prob()

Out[9]:

109

But the distribution is skewed to the right, so the posterior mean is substantially higher.

In [10]:

posterior_N.mean()

Out[10]:

173.79880627085643

In [11]:

posterior_N.credible_interval(0.9)

Out[11]:

array([ 77., 363.])

Two parameter model

In [12]:

ps = np.linspace(0, 1, 101)
prior_p = Pmf(1, ps)
prior_p.index.name = 'p'

In [13]:

from utils import make_joint

joint_prior = make_joint(prior_N, prior_p)

In [14]:

N_mesh, p_mesh = np.meshgrid(Ns, ps)
N_mesh.shape

Out[14]:

(101, 451)

In [15]:

from scipy.stats import binom

like1 = binom.pmf(K, N_mesh, p_mesh)
like1.sum()

Out[15]:

229.56130903199954

In [16]:

like2 = binom.pmf(k, K, p_mesh) * binom.pmf(n-k, N_mesh-K, p_mesh)
like2.sum()

Out[16]:

24.771767481921685

In [17]:

from utils import normalize

joint_posterior = joint_prior * like1 * like2
normalize(joint_posterior)

In [18]:

def plot_contour(joint, **options):
    """Plot a joint distribution.
    
    joint: DataFrame representing a joint PMF
    """
    cs = plt.contour(joint.columns, joint.index, joint, **options)
    decorate(xlabel=joint.columns.name, 
             ylabel=joint.index.name)
    return cs

In [19]:

plot_contour(joint_posterior)

decorate(title='Joint posterior distribution of N and p')

Out[19]:

In [20]:

from utils import marginal

marginal_N = marginal(joint_posterior, 0)

posterior_N.plot(color='gray')
marginal_N.plot()

decorate(xlabel='Population of bears (N)',
         ylabel='PDF',
         title='Posterior marginal distribution of N')

Out[20]:

In [21]:

marginal_N.mean(), marginal_N.credible_interval(0.9)

Out[21]:

(138.7505213647261, array([ 68., 277.]))

In [22]:

marginal_p = marginal(joint_posterior, 1)
marginal_p.plot()

decorate(xlabel='Probability of observing a bear',
         ylabel='PDF',
         title='Posterior marginal distribution of p')

Out[22]:

In [23]:

from seaborn import JointGrid

def joint_plot(joint, **options):
    x = joint.columns.name
    x = 'x' if x is None else x

    y = joint.index.name
    y = 'y' if y is None else y

    # make a JointGrid with minimal data
    data = pd.DataFrame({x:[0], y:[0]})
    g = JointGrid(x, y, data, **options)

    # replace the contour plot
    g.ax_joint.contour(joint.columns, 
                   joint.index, 
                   joint, 
                   cmap='viridis')
    
    # replace the marginals
    marginal_x = marginal(joint, 0)
    g.ax_marg_x.plot(marginal_x.qs, marginal_x.ps)
    
    marginal_y = marginal(joint, 1)
    g.ax_marg_y.plot(marginal_y.ps, marginal_y.qs)

In [24]:

joint_plot(joint_posterior)

Out[24]:

Two parameters better than one?

In [25]:

mean = (23 + 19) / 2
N1 = 138
p = mean/N1
p

Out[25]:

0.15217391304347827

In [26]:

from scipy.stats import binom

binom(N1, p).std()

Out[26]:

4.219519857292647

In [27]:

binom(N1, p).pmf([23, 19]).prod()

Out[27]:

0.007113233315460428

In [28]:

N2 = 173
p = mean/N2
p

Out[28]:

0.12138728323699421

In [29]:

binom(N2, p).std()

Out[29]:

4.2954472470306415

In [30]:

binom(N2, p).pmf([23, 19]).prod()

Out[30]:

0.006918408214247037

The Lincoln index problem

A few years ago my occasional correspondent John D. Cook wrote an excellent blog post about the Lincoln index, which is a way to estimate the number of errors in a document (or program) by comparing results from two independent testers.

http://www.johndcook.com/blog/2010/07/13/lincoln-index/

Here's his presentation of the problem:

"Suppose you have a tester who finds 20 bugs in your program. You want to estimate how many bugs are really in the program. You know there are at least 20 bugs, and if you have supreme confidence in your tester, you may suppose there are around 20 bugs. But maybe your tester isn't very good. Maybe there are hundreds of bugs. How can you have any idea how many bugs there are? There's no way to know with one tester. But if you have two testers, you can get a good idea, even if you don't know how skilled the testers are."

Suppose the first tester finds 20 bugs, the second finds 15, and they find 3 in common; how can we estimate the number of bugs?

In [31]:

n0 = 20
n1 = 15
k11 = 3

In [32]:

k10 = n0 - k11
k01 = n1 - k11

k10, k01

Out[32]:

(17, 12)

In [33]:

Ns = np.arange(32, 350)
prior_N = Pmf(1, Ns)
prior_N.index.name = 'N'

In [34]:

p0, p1 = 0.2, 0.15

like0 = binom.pmf(n0, Ns, p0)
like0.sum()

Out[34]:

4.9999998675182855

In [35]:

like1 = binom.pmf(k01, Ns-n0, p1) * binom.pmf(k11, n0, p1)
like1.sum()

Out[35]:

1.6188593076346907

In [36]:

likelihood = like0 * like1
likelihood.shape

Out[36]:

(318,)

In [37]:

prior_N.shape

Out[37]:

(318,)

In [38]:

posterior_N = prior_N * likelihood
posterior_N.normalize()

Out[38]:

0.10960645032182464

In [39]:

posterior_N.plot()

decorate(xlabel='n',
         ylabel='PMF',
         title='Posterior marginal distribution of n with known p1, p2')

Out[39]:

In [40]:

posterior_N.mean()

Out[40]:

102.12500000000003

Unknown probabilities

In [41]:

p0 = np.linspace(0, 1, 61)
prior_p0 = Pmf(1, p0)
prior_p0.index.name = 'p0'

In [42]:

p1 = np.linspace(0, 1, 51)
prior_p1 = Pmf(1, p1)
prior_p1.index.name = 'p1'

In [43]:

from utils import make_joint

joint = make_joint(prior_p0, prior_p1)
joint.shape

Out[43]:

(51, 61)

In [44]:

joint_pmf = Pmf(joint.transpose().stack())
joint_pmf.head()

Out[44]:

p0   p1  
0  0.00    1
02    1
04    1
06    1
08    1
dtype: int64

In [45]:

joint_prior = make_joint(prior_N, joint_pmf)
joint_prior.shape

Out[45]:

(3111, 318)

In [46]:

joint_prior.head()

Out[46]:

In [47]:

likelihood = joint_prior.copy()
Ns = joint_prior.columns

for (p0, p1) in joint_prior.index:
    like0 = binom.pmf(n0, Ns, p0)
    like1 = binom.pmf(k01, Ns-n0, p1) * binom.pmf(k11, n0, p1)
    likelihood.loc[p0, p1] = like0 * like1
    
likelihood.to_numpy().sum()

Out[47]:

8.930285561728951

In [48]:

from utils import normalize

joint_posterior = joint_prior * likelihood
normalize(joint_posterior)
joint_posterior.shape

Out[48]:

(3111, 318)

In [49]:

from utils import marginal

posterior_N = marginal(joint_posterior, 0)
posterior_N.plot()

Out[49]:

In [50]:

posterior_pmf = marginal(joint_posterior, 1)
posterior_pmf.shape

Out[50]:

(3111,)

In [51]:

posterior_joint_ps = posterior_pmf.unstack().transpose()
posterior_joint_ps.head()

Out[51]:

In [52]:

def plot_contour(joint, **options):
    """Plot a joint distribution.
    
    joint: DataFrame representing a joint PMF
    """
    cs = plt.contour(joint.columns, joint.index, joint, **options)
    decorate(xlabel=joint.columns.name, 
             ylabel=joint.index.name)
    return cs

In [53]:

plot_contour(posterior_joint_ps)
decorate(title='Posterior joint distribution for p1 and p2')

Out[53]:

In [54]:

posterior_p1 = marginal(posterior_joint_ps, 0)
posterior_p2 = marginal(posterior_joint_ps, 1)

posterior_p1.plot(label='p1')
posterior_p2.plot(label='p2')
decorate(xlabel='Probability of finding a bug',
         ylabel='PDF',
         title='Posterior marginal distributions of p1 and p2')

Out[54]:

In [55]:

posterior_p1.mean(), posterior_p1.credible_interval(0.9)

Out[55]:

(0.22970406720316164, array([0.1, 0.4]))

In [56]:

posterior_p2.mean(), posterior_p2.credible_interval(0.9)

Out[56]:

(0.17501024717938662, array([0.06, 0.32]))

In [57]:

joint_plot(posterior_joint_ps)

Out[57]:

Chao et al

In [58]:

data = [-1000, 63, 55, 18, 69, 17, 21, 28]

In [59]:

index = pd.MultiIndex.from_product([[0, 1]]*3)

In [60]:

Kijk = pd.Series(data, index)
Kijk

Out[60]:

0  0   -1000
    63
0      55
    18
0  0      69
    17
0      21
    28
dtype: int64

In [61]:

Kijk.xs(1, level=0).sum()

Out[61]:

135

In [62]:

Kijk.xs(1, level=1).sum()

Out[62]:

122

In [63]:

Kijk.xs(1, level=2).sum()

Out[63]:

126

In [64]:

n = pd.Series(1, range(3))

for level in n.index:
    n[level] = Kijk.xs(1, level=level).sum()
    
n

Out[64]:

  135
  122
  126
dtype: int64

In [65]:

Kixx = Kijk.sum(level=0)
Kixx

Out[65]:

0   -864
1    135
dtype: int64

In [66]:

Kijx = Kijk.sum(level=[0,1])
Kijx

Out[66]:

0   -937
   73
0     86
   49
dtype: int64

In [67]:

def num_observed(K):
    return np.asarray(K)[1:].sum()

In [68]:

s0 = num_observed(Kixx)
s0

Out[68]:

135

In [69]:

s1 = num_observed(Kijx)
s1

Out[69]:

208

In [70]:

s2 = num_observed(Kijk)
s2

Out[70]:

271

In [71]:

s = [s0, s1, s2]
s

Out[71]:

[135, 208, 271]

In [72]:

k = pd.concat([Kixx, Kijx, Kijk])
k

Out[72]:

0            -864
1             135
(0, 0)       -937
(0, 1)         73
(1, 0)         86
(1, 1)         49
(0, 0, 0)   -1000
(0, 0, 1)      63
(0, 1, 0)      55
(0, 1, 1)      18
(1, 0, 0)      69
(1, 0, 1)      17
(1, 1, 0)      21
(1, 1, 1)      28
dtype: int64

In [73]:

k[1]

Out[73]:

135

In [74]:

k[0,1]

Out[74]:

73

In [75]:

k[0,0,1]

Out[75]:

63

In [76]:

k[0]

Out[76]:

-864

In [77]:

N = 300
k0 = N - s[0]
k0

Out[77]:

165

In [78]:

k00 = N - s[1]
k00

Out[78]:

92

In [79]:

k000 = N - s[2]
k000

Out[79]:

29

In [80]:

Ns = np.arange(s2, 500, 5)
prior_N = Pmf(1, Ns)
prior_N.index.name = 'N'

In [81]:

ps = np.linspace(0, 1, 101)
prior_p = Pmf(1, ps)
prior_p.index.name = 'p'

In [82]:

from utils import make_joint

joint_prior = make_joint(prior_N, prior_p)

In [83]:

def likelihood_round1(ps, n, k, s, joint_prior):

    like = joint_prior.copy()

    for N in joint_prior:
        like[N] = binom.pmf(n[0], N, ps)
    return like

In [84]:

like1 = likelihood_round1(ps, n, k, s, joint_prior)

In [85]:

def likelihood_round2(ps, n, k, s, joint_prior):
    like = joint_prior.copy()

    for N in joint_prior:
        k0 = N - s[0]
        like[N] = (binom.pmf(k[0,1], k0, ps) * 
                   binom.pmf(k[1,1], k[1], ps))
    return like

In [86]:

like2 = likelihood_round2(ps, n, k, s, joint_prior)
like2.to_numpy().sum()

Out[86]:

0.40773670523278144

In [87]:

joint_posterior2 = joint_prior * like1 * like2
normalize(joint_posterior2)

In [88]:

plot_contour(joint_posterior2)
decorate(title='Joint posterior of N and p')

Out[88]:

In [89]:

marginal_N = marginal(joint_posterior2, 0)
marginal_N.plot()

decorate(xlabel='N',
         ylabel='PDF',
         title='Posterior marginal distribution of N')

Out[89]:

In [90]:

marginal_N.mean(), marginal_N.credible_interval(0.9)

Out[90]:

(342.19942566476345, array([296., 396.]))

In [91]:

marginal_p = marginal(joint_posterior2, 1)
marginal_p.plot()

decorate(xlabel='p',
         ylabel='PDF',
         title='Posterior marginal distribution of p')

Out[91]:

In [92]:

def likelihood_round3(ps, n, k, s, joint_prior):
    like = joint_prior.copy()

    for N in joint_prior:
        k00 = N - s[1]
        like[N] = (binom.pmf(k[0,0,1], k00, ps) * 
                   binom.pmf(k[0,1,1], k[0,1], ps) * 
                   binom.pmf(k[1,0,1], k[1,0], ps) * 
                   binom.pmf(k[1,1,1], k[1,1], ps))
    return like

In [93]:

like3 = likelihood_round3(ps, n, k, s, joint_prior)
like3.to_numpy().sum()

Out[93]:

1.7620042777092345e-07

In [94]:

joint_posterior3 = joint_posterior2 * like3
normalize(joint_posterior3)

In [95]:

plot_contour(joint_posterior3)
decorate(title='Joint posterior of N and p')

Out[95]:

In [96]:

marginal_N = marginal(joint_posterior3, 0)
marginal_N.plot()

decorate(xlabel='N',
         ylabel='PDF',
         title='Posterior marginal distribution of N')

Out[96]:

In [97]:

marginal_N.mean(), marginal_N.credible_interval(0.9)

Out[97]:

(391.0058839098183, array([356., 431.]))

In [98]:

marginal_p = marginal(joint_posterior3, 1)
marginal_p.plot()

decorate(xlabel='p',
         ylabel='PDF',
         title='Posterior marginal distribution of p')

Out[98]:

In [99]:

joint_plot(joint_posterior3)

Out[99]:

Spina bifida

In [100]:

data_sb = [-100, 60, 49, 4, 247, 112, 142, 12]
num_observed(data_sb)

Out[100]:

626

In [101]:

def make_stats(data, num_rounds):
    index = pd.MultiIndex.from_product([[0, 1]]*num_rounds)

    K = pd.Series(data, index)

    n = pd.Series(0, range(num_rounds))
    for level in n.index:
        n[level] = K.xs(1, level=level).sum()
    
    t = [K.sum(level=list(range(i+1)))
         for i in range(num_rounds)]
    
    s = [num_observed(Kx) for Kx in t]
        
    k = pd.concat(t)
    return n, k, s

In [102]:

n, k, s = make_stats(data_sb, 3)

In [103]:

Out[103]:

  513
  207
  188
dtype: int64

In [104]:

Out[104]:

0             13
1            513
(0, 0)       -40
(0, 1)        53
(1, 0)       359
(1, 1)       154
(0, 0, 0)   -100
(0, 0, 1)     60
(0, 1, 0)     49
(0, 1, 1)      4
(1, 0, 0)    247
(1, 0, 1)    112
(1, 1, 0)    142
(1, 1, 1)     12
dtype: int64

In [105]:

Out[105]:

[513, 566, 626]

In [106]:

Ns = np.arange(s[2], 1000, 5)
prior_N = Pmf(1, Ns)
prior_N.index.name = 'N'
prior_N.shape

Out[106]:

(75,)

In [107]:

probs0 = np.linspace(0.5, 1.0, 51)
prior_p0 = Pmf(1, probs0)
prior_p0.index.name = 'p0'
prior_p0.head()

Out[107]:

p0
50    1
51    1
52    1
53    1
54    1
dtype: int64

In [108]:

probs1 = np.linspace(0.1, 0.5, 41)
prior_p1 = Pmf(1, probs1)
prior_p1.index.name = 'p1'
prior_p1.head()

Out[108]:

p1
10    1
11    1
12    1
13    1
14    1
dtype: int64

In [109]:

probs2 = np.linspace(0.1, 0.4, 31)
prior_p2 = Pmf(1, probs2)
prior_p2.index.name = 'p2'
prior_p2.head()

Out[109]:

p2
10    1
11    1
12    1
13    1
14    1
dtype: int64

In [110]:

def make_joint3(prior0, prior1, prior2):
    joint2 = make_joint(prior0, prior1)
    joint2_pmf = Pmf(joint2.transpose().stack())
    joint3 = make_joint(prior2, joint2_pmf)
    return joint3

In [111]:

joint_prior = make_joint3(prior_p0, prior_p1, prior_N)
joint_prior.head()

Out[111]:

In [112]:

likelihood = joint_prior.copy()
Ns = joint_prior.columns

for (p0, p1) in joint_prior.index:
    like0 = binom.pmf(k[1], Ns, p0)
    k0 = Ns - s[0]
    like1 = binom.pmf(k[0,1], k0, p1) * binom.pmf(k[1,1], k[1], p1)
    likelihood.loc[p0, p1] = like0 * like1
    
likelihood.to_numpy().sum()

Out[112]:

0.018731276129625645

In [113]:

from utils import normalize

joint_posterior = joint_prior * likelihood
normalize(joint_posterior)
joint_posterior.shape

Out[113]:

(2091, 75)

In [114]:

from utils import marginal

posterior_N = marginal(joint_posterior, 0)
posterior_N.plot()
decorate(xlabel='N',
         ylabel='PDF',
         title='Posterior for N after two rounds')
posterior_N.mean(), posterior_N.credible_interval(0.9)

Out[114]:

(692.2431131826501, array([656., 736.]))

In [115]:

posterior_pmf = marginal(joint_posterior, 1)
posterior_pmf.head()

Out[115]:

p0   p1  
5  0.10    2.947207e-37
11    1.861223e-32
12    1.953523e-28
13    4.500420e-25
14    2.829563e-22
dtype: float64

In [116]:

posterior_joint_ps = posterior_pmf.unstack().transpose()
posterior_joint_ps.head()

Out[116]:

In [117]:

plot_contour(posterior_joint_ps)
decorate(title='Posterior joint distribution for p1 and p2')

Out[117]:

In [118]:

posterior_p1 = marginal(posterior_joint_ps, 0)
posterior_p2 = marginal(posterior_joint_ps, 1)

posterior_p1.plot(label='p1')
posterior_p2.plot(label='p2')
decorate(xlabel='Probability of finding a bug',
         ylabel='PDF',
         title='Posterior marginal distributions of p1 and p2')

Out[118]:

In [119]:

joint3 = make_joint(prior_p2, posterior_pmf)
joint3.head()

Out[119]:

In [120]:

joint3_pmf = Pmf(joint3.stack())
joint3_pmf.head()

Out[120]:

p0   p1   p2  
0.5  0.1  0.10    2.947207e-37
          0.11    2.947207e-37
          0.12    2.947207e-37
          0.13    2.947207e-37
          0.14    2.947207e-37
dtype: float64

In [121]:

prior4 = make_joint(posterior_N, joint3_pmf)
prior4.head()

Out[121]:

In [122]:

prior4.shape

Out[122]:

(64821, 75)

In [ ]:

In [123]:

joint2 = make_joint(posterior_N, prior_p2)
joint2.head()

Out[123]:

In [124]:

like2 = joint2.copy()
Ns = joint2.columns

for p2 in joint2.index:
    k00 = Ns - s[1]
    like = (binom.pmf(k[0,0,1], k00, p2) *
            binom.pmf(k[0,1,1], k[0,1], p2) *
            binom.pmf(k[1,0,1], k[1,0], p2) *
            binom.pmf(k[1,1,1], k[1,1], p2))
    
    like2.loc[p2] = like

like2.to_numpy().sum()

Out[124]:

2.874211433485911e-13

In [125]:

like2.head()

Out[125]:

In [126]:

like4 = prior4.copy()

for (p0, p1, p2) in prior4.index:
    like4.loc[p0, p1, p2] = like2.loc[p2]
    
like4.to_numpy().sum()

Out[126]:

6.009976107419042e-10

In [127]:

posterior4 = prior4 * like4
normalize(posterior4)

In [128]:

marginal_N = marginal(posterior4, 0)
marginal_N.plot()

decorate(xlabel='N',
         ylabel='PDF',
         title='Posterior for N after three rounds')

marginal_N.mean(), marginal_N.credible_interval(0.9)

Out[128]:

(764.7788307540033, array([731., 801.]))

In [129]:

posterior_p012 = marginal(posterior4, 1)
posterior_p012.unstack().head()

Out[129]:

In [130]:

posterior_p2 = marginal(posterior_p012.unstack(), 0)
posterior_p2.plot()

Out[130]:

In [131]:

posterior_p01 = marginal(posterior_p012.unstack(), 1)
joint_plot(posterior_p01.unstack().transpose())

Out[131]:

Diabetes

In [132]:

data4 = [-10000, 10, 182, 8, 74, 7, 20, 14, 709, 12, 650, 46, 104, 18, 157, 58]
num_observed(data4)

Out[132]:

2069

In [133]:

n, k, s = make_stats(data4, 4)

In [134]:

Out[134]:

  1754
   452
  1135
   173
dtype: int64

In [135]:

Out[135]:

0               -9685
1                1754
(0, 0)          -9800
(0, 1)            115
(1, 0)           1417
(1, 1)            337
(0, 0, 0)       -9990
(0, 0, 1)         190
(0, 1, 0)          81
(0, 1, 1)          34
(1, 0, 0)         721
(1, 0, 1)         696
(1, 1, 0)         122
(1, 1, 1)         215
(0, 0, 0, 0)   -10000
(0, 0, 0, 1)       10
(0, 0, 1, 0)      182
(0, 0, 1, 1)        8
(0, 1, 0, 0)       74
(0, 1, 0, 1)        7
(0, 1, 1, 0)       20
(0, 1, 1, 1)       14
(1, 0, 0, 0)      709
(1, 0, 0, 1)       12
(1, 0, 1, 0)      650
(1, 0, 1, 1)       46
(1, 1, 0, 0)      104
(1, 1, 0, 1)       18
(1, 1, 1, 0)      157
(1, 1, 1, 1)       58
dtype: int64

In [136]:

Out[136]:

[1754, 1869, 2059, 2069]