Bayesian Statistics
In this notebook, we illustrate some basic concepts in (computaitonal) Bayesian statistics. We borrow some code examples from chapter 2 of Bayesian Analysis with Python (2nd end) by Osvaldo Martin.
The primary goal is to infer the posterior of the parameters of some model given observations or data :
If we assume the observations are independently sampled and identically distributed (iid), then we can write the likelihood as
If we have features, we can fit a conditional or discriminative model of the form , where the model generates the output given the input . In this case, the likelihood becomes
Collecting pymc3==3.8
Downloading https://files.pythonhosted.org/packages/32/19/6c94cbadb287745ac38ff1197b9fadd66500b6b9c468e79099b110c6a2e9/pymc3-3.8-py3-none-any.whl (908kB)
|████████████████████████████████| 911kB 6.7MB/s
Requirement already satisfied: numpy>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (1.19.4)
Requirement already satisfied: scipy>=0.18.1 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (1.4.1)
Requirement already satisfied: h5py>=2.7.0 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (2.10.0)
Requirement already satisfied: tqdm>=4.8.4 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (4.41.1)
Collecting arviz>=0.4.1
Downloading https://files.pythonhosted.org/packages/a9/05/54183e9e57b0793eceb67361dbf4a7c4ed797ae36a04a3287791a564568c/arviz-0.10.0-py3-none-any.whl (1.5MB)
|████████████████████████████████| 1.5MB 21.2MB/s
Requirement already satisfied: theano>=1.0.4 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (1.0.5)
Requirement already satisfied: patsy>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (0.5.1)
Requirement already satisfied: pandas>=0.18.0 in /usr/local/lib/python3.6/dist-packages (from pymc3==3.8) (1.1.5)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from h5py>=2.7.0->pymc3==3.8) (1.15.0)
Requirement already satisfied: setuptools>=38.4 in /usr/local/lib/python3.6/dist-packages (from arviz>=0.4.1->pymc3==3.8) (51.1.1)
Collecting xarray>=0.16.1
Downloading https://files.pythonhosted.org/packages/10/6f/9aa15b1f9001593d51a0e417a8ad2127ef384d08129a0720b3599133c1ed/xarray-0.16.2-py3-none-any.whl (736kB)
|████████████████████████████████| 737kB 41.7MB/s
Requirement already satisfied: matplotlib>=3.0 in /usr/local/lib/python3.6/dist-packages (from arviz>=0.4.1->pymc3==3.8) (3.2.2)
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from arviz>=0.4.1->pymc3==3.8) (20.8)
Collecting netcdf4
Downloading https://files.pythonhosted.org/packages/1a/71/a321abeee439caf94850b4f68ecef68d2ad584a5a9566816c051654cff94/netCDF4-1.5.5.1-cp36-cp36m-manylinux2014_x86_64.whl (4.7MB)
|████████████████████████████████| 4.7MB 45.2MB/s
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.18.0->pymc3==3.8) (2018.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.18.0->pymc3==3.8) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.0->arviz>=0.4.1->pymc3==3.8) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.0->arviz>=0.4.1->pymc3==3.8) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.0->arviz>=0.4.1->pymc3==3.8) (1.3.1)
Collecting cftime
Downloading https://files.pythonhosted.org/packages/66/60/bad8525d2c046eb2964911bc412a85ba240b31c7b43f0c19377233992c6c/cftime-1.3.0-cp36-cp36m-manylinux1_x86_64.whl (295kB)
|████████████████████████████████| 296kB 47.4MB/s
Installing collected packages: xarray, cftime, netcdf4, arviz, pymc3
Found existing installation: xarray 0.15.1
Uninstalling xarray-0.15.1:
Successfully uninstalled xarray-0.15.1
Found existing installation: pymc3 3.7
Uninstalling pymc3-3.7:
Successfully uninstalled pymc3-3.7
Successfully installed arviz-0.10.0 cftime-1.3.0 netcdf4-1.5.5.1 pymc3-3.8 xarray-0.16.2
Requirement already satisfied: arviz in /usr/local/lib/python3.6/dist-packages (0.10.0)
Requirement already satisfied: pandas>=0.23 in /usr/local/lib/python3.6/dist-packages (from arviz) (1.1.5)
Requirement already satisfied: xarray>=0.16.1 in /usr/local/lib/python3.6/dist-packages (from arviz) (0.16.2)
Requirement already satisfied: scipy>=0.19 in /usr/local/lib/python3.6/dist-packages (from arviz) (1.4.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from arviz) (20.8)
Requirement already satisfied: setuptools>=38.4 in /usr/local/lib/python3.6/dist-packages (from arviz) (51.1.1)
Requirement already satisfied: netcdf4 in /usr/local/lib/python3.6/dist-packages (from arviz) (1.5.5.1)
Requirement already satisfied: matplotlib>=3.0 in /usr/local/lib/python3.6/dist-packages (from arviz) (3.2.2)
Requirement already satisfied: numpy>=1.12 in /usr/local/lib/python3.6/dist-packages (from arviz) (1.19.4)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.23->arviz) (2018.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.23->arviz) (2.8.1)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->arviz) (2.4.7)
Requirement already satisfied: cftime in /usr/local/lib/python3.6/dist-packages (from netcdf4->arviz) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.0->arviz) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.0->arviz) (1.3.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.7.3->pandas>=0.23->arviz) (1.15.0)
Beta-Binomial model
Exact inference
Point estimates
We minimize the posterior expected loss, using L2 loss (estimator is posterior mean) or L1 loss (estimator is posterior median).
MCMC inference
We will use pymc3 to approximate the posterior of this simple model. Code is based on this notebook.
Variational inference
We use automatic differentiation VI. Details can be found at https://docs.pymc.io/notebooks/variational_api_quickstart.html
1d Gaussian
Exact inference
For simplicity, we assume the variance is known, and we just want to infer the mean.
MCMC inference
Initially we assume the variance is known, so we can compare results to exact infernece.
Now we consider the case where the mean and variance are both unknown. We also switch to a "real world" dataset, of "chemical shifts", that has a couple of "outliers".
Posterior predictive checks
Robust likelihood (1d Student distribution)
Comparing means of different datasets
We often want to know if one dataset, , has a "statistically signficant" difference in one of its parameters, such as its mean , compared to some other dataset, . We can answer this in a Bayesian way by computing , where
.
To do this, we draw samples from and .
Since the magnitude of can be hard to interpret, it is common to divide it by the pooled standard deviation, to get a metric known as Cohen's d (see this website for details):
We can compute using posterior samples of .