Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Image: ubuntu2004
Lab 10: Bayesian Statistics
In order to understand how Bayesian statistics works, we're going to start with a simple scenario. Imagine you are studying coastal birds of southern California. Two models describing the abundance of seagulls and cormorants have been proposed. Model 1 predicts that you will observe 75% cormorants and 25% seagulls, while model 2 predicts that each species will make up 50% of your observations. Heading out to Ballona Creek to observe birds, you want to know the probability that model 1 is true.
Let's call "model 1 being true" event A and "observing a bird (either seagull or cormorant)" event B. We can easily find , but using this information to determine the probability that model 1 is correct requires an equation known as Bayes' theorem.
In this lab, you will first compute and then update this probability based on new evidence (observing more birds).
Import Numpy and Seaborn.
Set up an array for each model with the probabilities of observing cormorants and seagulls.
If you know nothing that would favor one model over the other, what is the probability that model 1 is best? Set up another array that has the probabilities of each model.
We need one more piece of information: . This is the overall probability of observing a cormorant, regardless of model. It is the sum of the probabilities of observing a cormorant predicted by each model, weighted by the relative probabilities of the models. The general formula is , where means "not A" -- in this case, "model 2 being true".
Find p(B).
Now, use Bayes' theorem to find the probability that model 1 is true given that you observed a cormorant.
Update the array holding the probability of each model to reflect your new knowledge. Make sure to change both numbers!
The next bird you observe is also a cormorant. Now, what is the probability of model 1 being true? Do this calculation and update the probability array using arrays and indexing rather than typing in numbers. (This is a step toward automating the process.)
You now observe a seagull. Update the probabilities again, also without hard-coding numbers.
Write a script or function that will perform the updating automatically for a sequence of birds. Test it on a sequence of 3 (you can use a list of Cs and Ss).
Starting with a prior of 0.5 for each model, use the list given below to update your array of probabilities for the models. Make a list of values.
Plot your list using the
sns.lineplot
command. The basic syntax issns.lineplot(x,y)
. You can userange
to generate a list of x-values and use your list of values as y-values.
Describe what you see in the graph. How does the probability of model 1 change as we collect new evidence?