Probability
In this notebook, we illustrate some basic concepts from probability theory using Python code.
Software libraries
There are several software libraries that implement standard probability distributions, and functions for manipulating them (e.g., sampling, fitting). We list some below.
scipy.stats We illustrate how to use this below.
Tensorflow probability (TFP) Similar API to scipy.stats.
Distrax JAX version of TFP.
Pytorch distributions library. Similar to TFP.
NumPyro distributions library has a similar interface to PyTorch distributions, but uses JAX as the backend.
In this notebook, we mostly focus on scipy.stats.
Basics of Probability theory
What is probability?
We will not go into mathematical detail, but focus on intuition.
Two main "schools of thought"
Bayesian probability = degree of belief
means you think the event that a particular coin will land heads is 50% likely.
Frequentist probability = long run frequencies
means that the empirical fraction of times this event will occur across infinitely repeated trials is 50%
In practice, the philosophy does not matter much, since both interpretations must satisfy the same basic axioms
Random variables and their distributions
Let be a (discrete) random variable (RV) with possible values .
Let be the event that has value , for some state .
We require
We require
Let be the distribution or probability mass function (pmf) for RV .
We can generalize this to continuous random variables, which have an infinite number of possible states, using a probability density function (pdf) which satisfies
Conjunction and disjunction of events
The probability of events AND is denoted or just .
If two RVs are independent, then .
The probability of event OR is
For disjoint events (that cannot co-occur), this becomes
Conditional probability, sum rule, product rule, Bayes rule
The conditional probability of Y=y given X=x is defined to be
Hence we derive the product rule
If and are independent, then and , so .
The marginal probability of is given by the sum rule
Hence we derive Bayes' rule
Bayesian inference
Bayes rule is often used to compute a distribution over possible values of a hidden variable or hypothesis after observing some evidence . We can write this as follows:
The prior encodes what we believe about the state before we see any data.
The likelihood is the probability of observing the data given each possible hidden state.
The posterior is our new belief state, after seeing the data.
The marginal likelihood is a normalization constant, independent of the hidden state, so can usually be ignored.
Applying Bayes rule to infer a hidden quantity from one or more observations is called Bayesian inference or posterior inference. (It used to be called inverse probability, since it reasons backwards from effects to causes.)
Example: Bayes rule for COVID diagnosis
Consider estimating if someone has COVID or not on the basis of a PCR test. The test can either return a positive result or a negative result . The reliability of the test is given by the following observation model.
Using data from https://www.nytimes.com/2020/08/04/science/coronavirus-bayes-statistics-math.html, we set sensitivity to 87.5% and the specificity to 97.5%.
We also need to specify the prior probability ; this is known as the prevalence. This varies over time and place, but let's pick as a reasonable estimate.
If you test positive:
If you test negative:
Code to reproduce the above.
For a prevalence of
For a prevalence of
Univariate distributions
Zipf's law
In this section, we study the empirical word frequencies derived from H. G. Wells' book The time machine. Our code is based on https://github.com/d2l-ai/d2l-en/blob/master/chapter_recurrent-neural-networks/lang-model.md
Illustrate correlation coefficient.
Code is from Bayesian Analysis with Python, ch. 3