Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
allendowney
GitHub Repository: allendowney/thinkbayes2
Path: blob/master/examples/grizzly_soln.ipynb
1901 views
Kernel: Python 3

Think Bayes

This notebook presents example code and exercise solutions for Think Bayes.

Copyright 2018 Allen B. Downey

MIT License: https://opensource.org/licenses/MIT

# Configure Jupyter so figures appear in the notebook %matplotlib inline # Configure Jupyter to display the assigned value after an assignment %config InteractiveShell.ast_node_interactivity='last_expr_or_assign' # import classes from thinkbayes2 from thinkbayes2 import Hist, Pmf, Suite import thinkplot

The Grizzly Bear Problem

In 1996 and 1997 Mowat and Strobeck deployed bear traps in locations in British Columbia and Alberta, in an effort to estimate the population of grizzly bears. They describe the experiment in "Estimating Population Size of Grizzly Bears Using Hair Capture, DNA Profiling, and Mark-Recapture Analysis"

The "trap" consists of a lure and several strands of barbed wire intended to capture samples of hair from bears that visit the lure. Using the hair samples, the researchers use DNA analysis to identify individual bears.

During the first session, on June 29, 1996, the researchers deployed traps at 76 sites. Returning 10 days later, they obtained 1043 hair samples and identified 23 different bears. During a second 10-day session they obtained 1191 samples from 19 different bears, where 4 of the 19 were from bears they had identified in the first batch.

To estimate the population of bears from this data, we need a model for the probability that each bear will be observed during each session. As a starting place, we'll make the simplest assumption, that every bear in the population has the same (unknown) probability of being sampled during each round.

We also need a prior distribution for the population. As a starting place, let's suppose that, prior to this study, an expert in this domain would have estimated that the population is between 100 and 500, and equally likely to be any value in that range.

Solution:

Define:

  • N: population size

  • K: number of bears that have ever been identified

  • n: number of bears observed in the second second

  • k: the number of bears in the second session that had previously been identified

For given values of N, K, and n, the distribution of k is the hypergeometric distribution:

PMF(k)=(Kk)(N−Kn−k)/(Nn)PMF(k) = {K \choose k}{N-K \choose n-k}/{N \choose n}

# Solution from scipy.special import binom class Grizzly(Suite): """Represents hypotheses about how many bears there are.""" def Likelihood(self, data, hypo): """Computes the likelihood of the data under the hypothesis. hypo: total population (N) data: # tagged (K), # caught (n), # of caught who were tagged (k) """ N = hypo K, n, k = data if hypo < K + (n - k): return 0 like = binom(N-K, n-k) / binom(N, n) return like
# Solution hypos = range(100, 501) suite = Grizzly(hypos) data = 23, 19, 4 suite.Update(data)
8.05801258299152e-06
# Solution thinkplot.Pdf(suite) thinkplot.Config(xlabel='Number of bears', ylabel='PMF', legend=False)
Image in a Jupyter notebook
# Solution print('Posterior mean', suite.Mean()) print('Maximum a posteriori estimate', suite.MaximumLikelihood()) print('90% credible interval', suite.CredibleInterval(90))
Posterior mean 193.93845891363907 Maximum a posteriori estimate 109 90% credible interval (105, 379)
# Solution # Alternatively, we can take advantage of the `hypergeom` # object in scipy.stats. from scipy import stats class Grizzly2(Suite): """Represents hypotheses about how many bears there are.""" def Likelihood(self, data, hypo): """Computes the likelihood of the data under the hypothesis. hypo: total population (N) data: # tagged (K), # caught (n), # of caught who were tagged (k) """ N = hypo K, n, k = data if hypo < K + (n - k): return 0 like = stats.hypergeom.pmf(k, N, K, n) return like
# Solution hypos = range(100, 501) suite = Grizzly2(hypos) data = 23, 19, 4 suite.Update(data)
0.07135370142238903
# Solution print('Posterior mean', suite.Mean()) print('Maximum a posteriori estimate', suite.MaximumLikelihood()) print('90% credible interval', suite.CredibleInterval(90))
Posterior mean 193.9384589136376 Maximum a posteriori estimate 109 90% credible interval (105, 379)