Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
allendowney
GitHub Repository: allendowney/thinkbayes2
Path: blob/master/examples/dungeons_soln.ipynb
1901 views
Kernel: Python 3

Think Bayes

This notebook presents example code and exercise solutions for Think Bayes.

Copyright 2018 Allen B. Downey

MIT License: https://opensource.org/licenses/MIT

# Configure Jupyter so figures appear in the notebook %matplotlib inline # Configure Jupyter to display the assigned value after an assignment %config InteractiveShell.ast_node_interactivity='last_expr_or_assign' # import classes from thinkbayes2 from thinkbayes2 import Pmf, Suite import thinkbayes2 import thinkplot

The Dungeons and Dragons club

Suppose there are 10 people in my Dungeons and Dragons club; on any game day, each of them has a 70% chance of showing up.

Each player has one character and each character has 6 attributes, each of which is generated by rolling and adding up 3 6-sided dice.

At the beginning of the game, I ask whose character has the lowest attribute. The wizard says, "My constitution is 5; does anyone have a lower attribute?", and no one does.

The warrior says "My strength is 16; does anyone have a higher attribute?", and no one does.

How many characters are in the party?

The prior

There are three ways to compute the prior distribution:

  • Simulation

  • Convolution

  • Analytic distribution

First, simulation. Here's a function that flips a coin with probability p:

from random import random def flip(p): return random() < p

We can use it to flip a coin for each member of the club.

flips = [flip(0.7) for i in range(10)]
[False, True, False, True, False, False, True, False, False, True]

And count the number that show up on game day.

sum(flips)
4

Let's encapsulate that in a function that simulates a game day.

def game_day(n, p): flips = [flip(p) for i in range(n)] return sum(flips)
game_day(10, 0.7)
8

If we run that function many times, we get a sample from the distribution of the number of players.

sample = [game_day(10, 0.7) for i in range(1000)] pmf_sample = Pmf(sample) thinkplot.Hist(pmf_sample)
Image in a Jupyter notebook

The second method is convolution. Instead of flipping a coin, we can create a Pmf object that represents the distribution of outcomes from a single flip.

def coin(p): return Pmf({1:p, 0:1-p})

Here's what it looks like.

player = coin(0.7) player.Print()
0 0.30000000000000004 1 0.7

If we have two players, there are three possible outcomes:

(player + player).Print()
0 0.09000000000000002 1 0.42000000000000004 2 0.48999999999999994

If we have 10 players, we can get the prior distribution like this:

prior = sum([player]*10) prior.Print()
0 5.9049000000000085e-06 1 0.00013778100000000018 2 0.0014467005000000017 3 0.009001692000000009 4 0.036756909000000025 5 0.10291934520000004 6 0.20012094900000005 7 0.26682793200000005 8 0.2334744405 9 0.12106082099999994 10 0.028247524899999984

Now we can compare the results of simulation and convolution:

thinkplot.Hist(pmf_sample, color='C0') thinkplot.Pmf(prior, color='C1') thinkplot.decorate(xlabel='Number of players', ylabel='PMF')
Image in a Jupyter notebook

Finally, we can use an analytic distribution. The distribution we just computed is the binomial distribution, which has the following PMF:

PMF(k;n,p)=P(k ∣ n,p)=(nk) pk(1−p)n−k PMF(k; n, p) = P(k ~|~ n, p) = {n \choose k}\,p^{k}(1-p)^{n-k}

We could evalate the right hand side in Python, or use MakeBinomialPmf:

help(thinkbayes2.MakeBinomialPmf)
Help on function MakeBinomialPmf in module thinkbayes2.thinkbayes2: MakeBinomialPmf(n, p) Evaluates the binomial PMF. n: number of trials p: probability of success on each trial Returns: Pmf of number of successes

And we can confirm that the analytic result matches what we computed by convolution.

binomial = thinkbayes2.MakeBinomialPmf(10, 0.7) thinkplot.Pmf(prior, color='C1') thinkplot.Pmf(binomial, color='C2', linestyle='dotted') thinkplot.decorate(xlabel='Number of players', ylabel='PMF')
Image in a Jupyter notebook

Since two players spoke, we can eliminate the possibility of 0 or 1 players:

thinkplot.Pmf(prior, color='gray') del prior[0] del prior[1] prior.Normalize() thinkplot.Pmf(prior, color='C1') thinkplot.decorate(xlabel='Number of players', ylabel='PMF')
Image in a Jupyter notebook

Likelihood

There are three components of the likelihood function:

  • The probability that the highest attribute is 16.

  • The probability that the lowest attribute is 5.

  • The probability that the lowest and highest attributes are held by different players.

To compute the first component, we have to compute the distribution of the maximum of 6n6n attributes, where nn is the number of players.

Here is the distribution for a single die.

d6 = Pmf([1,2,3,4,5,6]) d6.Print()
1 0.16666666666666666 2 0.16666666666666666 3 0.16666666666666666 4 0.16666666666666666 5 0.16666666666666666 6 0.16666666666666666

And here's the distribution for the sum of three dice.

thrice = sum([d6] * 3) thinkplot.Pdf(thrice) thinkplot.decorate(xlabel='Attribute', ylabel='PMF')
Image in a Jupyter notebook

Here's the CDF for the sum of three dice.

cdf_thrice = thrice.MakeCdf() thinkplot.Cdf(cdf_thrice) thinkplot.decorate(xlabel='Attribute', ylabel='CDF')
Image in a Jupyter notebook

The Max method raises the CDF to a power. So here's the CDF for the maximum of six attributes.

cdf_max_6 = cdf_thrice.Max(6) thinkplot.Cdf(cdf_max_6) thinkplot.decorate(xlabel='Attribute', ylabel='CDF', title='Maximum of 6 attributes')
Image in a Jupyter notebook

If there are n players, there are 6*n attributes. Here are the distributions for the maximum attribute of n players, for a few values of n.

for n in range(2, 10, 2): cdf_max = cdf_thrice.Max(n*6) thinkplot.Cdf(cdf_max, label='n=%s'%n) thinkplot.decorate(xlabel='Attribute', ylabel='CDF', title='Maximum of 6*n attributes')
Image in a Jupyter notebook

To check that, I'll compute the CDF for 7 players, and estimate it by simulation.

n = 7 cdf = cdf_thrice.Max(n*6) thinkplot.Cdf(cdf, label='n=%s'%n) sample_max = [max(cdf_thrice.Sample(42)) for i in range(1000)] thinkplot.Cdf(thinkbayes2.Cdf(sample_max), label='sample') thinkplot.decorate(xlabel='Attribute', ylabel='CDF', title='Maximum of 6*n attributes')
Image in a Jupyter notebook

Looks good.

Now, to compute the minimum, I have to write my own function, because Cdf doesn't provide a Min function.

def compute_cdf_min(cdf, k): """CDF of the min of k samples from cdf. cdf: Cdf object k: number of samples returns: new Cdf object """ cdf_min = cdf.Copy() cdf_min.ps = 1 - (1 - cdf_min.ps)**k return cdf_min

Now we can compute the CDF of the minimum attribute for n players, for several values of n.

for n in range(2, 10, 2): cdf_min = compute_cdf_min(cdf_thrice, n*6) thinkplot.Cdf(cdf_min, label='n=%s'%n) thinkplot.decorate(xlabel='Attribute', ylabel='CDF', title='Minimum of 6*n attributes')
Image in a Jupyter notebook

And again we can check it by comparing to simulation results.

n = 7 cdf = compute_cdf_min(cdf_thrice, n*6) thinkplot.Cdf(cdf, label='n=%s'%n) sample_min = [min(cdf_thrice.Sample(42)) for i in range(1000)] thinkplot.Cdf(thinkbayes2.Cdf(sample_min), label='sample') thinkplot.decorate(xlabel='Attribute', ylabel='CDF', title='Minimum of 6*n attributes')
Image in a Jupyter notebook

For efficiency and conciseness, it is helpful to precompute the distributions for the relevant values of n, and store them in dictionaries.

like_min = {} like_max = {} for n in range(2, 11): cdf_min = compute_cdf_min(cdf_thrice, n*6) like_min[n] = cdf_min.MakePmf() cdf_max = cdf_thrice.Max(n*6) like_max[n] = cdf_max.MakePmf() print(like_min[n][5], like_max[n][16])
0.23288163935889017 0.23288163935889017 0.28826338107405935 0.2882633810740594 0.31794402625472684 0.3179440262547268 0.32955796250238156 0.32955796250238156 0.32871475364520075 0.3287147536452008 0.3195146933518256 0.3195146933518255 0.3049352170780888 0.30493521707808885 0.2871203018328896 0.28712030183288956 0.2675970126720095 0.2675970126720096

The output shows that the particular data we saw is symmetric: the chance that 16 is the maximum is the same as the chance that 5 is the minimum.

Finally, we need the probability that the minimum and maximum are held by the same person. If there are n players, there are 6*n attributes.

Let's call the player with the highest attribute Max. What is the chance that Max also has the lowest attribute? Well Max has 5 more attributes, out of a total of 6*n-1 remaining attributes.

So here's prob_same as a function of n.

def prob_same(n): return 5 / (6*n-1) for n in range(2, 11): print(n, prob_same(n))
2 0.45454545454545453 3 0.29411764705882354 4 0.21739130434782608 5 0.1724137931034483 6 0.14285714285714285 7 0.12195121951219512 8 0.10638297872340426 9 0.09433962264150944 10 0.0847457627118644

The update

Here's a class that implements this likelihood function.

class Dungeons(Suite): def Likelihood(self, data, hypo): """Probability of the data given the hypothesis. data: lowest attribute, highest attribute, boolean (whether the same person has both) hypo: number of players returns: probability """ lowest, highest, same = data n = hypo p = prob_same(n) like = p if same else 1-p like *= like_min[n][lowest] like *= like_max[n][highest] return like

Here's the prior we computed above.

suite = Dungeons(prior) thinkplot.Hist(suite) thinkplot.decorate(xlabel='Number of players', ylabel='PMF') suite.Mean()
7.000868145040201
Image in a Jupyter notebook

And here's the update based on the data in the problem statement.

suite.Update((5, 16, False))
0.08548474490284354

Here's the posterior.

thinkplot.Hist(suite) thinkplot.decorate(xlabel='Number of players', ylabel='PMF') suite.Mean()
6.940862784521086
Image in a Jupyter notebook
suite.Print()
2 0.0005007044860801902 3 0.006177449626400222 4 0.03402191676923563 5 0.10823000113979393 6 0.21684925990462955 7 0.279837163496623 8 0.22697589141459018 9 0.10574761295351938 10 0.02166000020912791

Based on the data, I am 94% sure there are between 5 and 9 players.

suite.CredibleInterval()
(5, 9)
sum(suite[n] for n in [5,6,7,8,9])
0.9376399289091562