Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
probml
GitHub Repository: probml/pyprobml
Path: blob/master/notebooks/misc/CmdStanPy_Example_Notebook.ipynb
1192 views
Kernel: Python 3

Open In Colab

Running STAN (CmdStanPy) MCMC library in Colab Example

Taken from

https://mc-stan.org/users/documentation/case-studies/jupyter_colab_notebooks_2020.html

This notebook demonstrates how to install the CmdStanPy toolchain on a Google Colab instance and verify the installation by running the Stan NUTS-HMC sampler on the example model and data which are included with CmdStan. Each code block in this notebook updates the Python environment, therefore you must step through this notebook cell by cell.

# Load packages used in this notebook import os import json import shutil import urllib.request import pandas as pd

Step 1: install CmdStanPy

# Install package CmdStanPy !pip install --upgrade cmdstanpy
Requirement already satisfied: cmdstanpy in /usr/local/lib/python3.7/dist-packages (0.9.5) Collecting cmdstanpy Downloading cmdstanpy-1.0.0-py3-none-any.whl (64 kB) |████████████████████████████████| 64 kB 2.4 MB/s Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from cmdstanpy) (1.1.5) Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from cmdstanpy) (4.62.3) Requirement already satisfied: numpy!=1.19.4,>=1.15 in /usr/local/lib/python3.7/dist-packages (from cmdstanpy) (1.19.5) Collecting ujson Downloading ujson-5.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (43 kB) |████████████████████████████████| 43 kB 2.9 MB/s Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->cmdstanpy) (2.8.2) Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->cmdstanpy) (2018.9) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas->cmdstanpy) (1.15.0) Installing collected packages: ujson, cmdstanpy Attempting uninstall: cmdstanpy Found existing installation: cmdstanpy 0.9.5 Uninstalling cmdstanpy-0.9.5: Successfully uninstalled cmdstanpy-0.9.5 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. fbprophet 0.7.1 requires cmdstanpy==0.9.5, but you have cmdstanpy 1.0.0 which is incompatible. Successfully installed cmdstanpy-1.0.0 ujson-5.1.0

Step 2: download and untar the CmdStan binary for Google Colab instances.

# Install pre-built CmdStan binary # (faster than compiling from source via install_cmdstan() function) tgz_file = "colab-cmdstan-2.23.0.tar.gz" tgz_url = "https://github.com/stan-dev/cmdstan/releases/download/v2.23.0/colab-cmdstan-2.23.0.tar.gz" if not os.path.exists(tgz_file): urllib.request.urlretrieve(tgz_url, tgz_file) shutil.unpack_archive(tgz_file)
!ls
cmdstan-2.23.0 colab-cmdstan-2.23.0.tar.gz sample_data

Step 3: Register the CmdStan install location.

# Specify CmdStan location via environment variable os.environ["CMDSTAN"] = "./cmdstan-2.23.0" # Check CmdStan path from cmdstanpy import CmdStanModel, cmdstan_path cmdstan_path()
'cmdstan-2.23.0'

The CmdStan installation includes a simple example program bernoulli.stan and test data bernoulli.data.json. These are in the CmdStan installation directory examples/bernoulli.

The program bernoulli.stan takes a vector y of length N containing binary outcomes and uses a bernoulli distribution to estimate theta, the chance of success.

bernoulli_stan = os.path.join(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan") with open(bernoulli_stan, "r") as fd: print("\n".join(fd.read().splitlines()))
data { int<lower=0> N; int<lower=0,upper=1> y[N]; } parameters { real<lower=0,upper=1> theta; } model { theta ~ beta(1,1); // uniform prior on interval 0,1 y ~ bernoulli(theta); }

The data file bernoulli.data.json contains 10 observations, split between 2 successes (1) and 8 failures (0).

bernoulli_data = os.path.join(cmdstan_path(), "examples", "bernoulli", "bernoulli.data.json") with open(bernoulli_data, "r") as fd: print("\n".join(fd.read().splitlines()))
{ "N" : 10, "y" : [0,1,0,0,0,0,0,0,0,1] }

The following code test that the CmdStanPy toolchain is properly installed by compiling the example model, fitting it to the data, and obtaining a summary of estimates of the posterior distribution of all parameters and quantities of interest.

# Run CmdStanPy Hello, World! example from cmdstanpy import cmdstan_path, CmdStanModel # Compile example model bernoulli.stan bernoulli_model = CmdStanModel(stan_file=bernoulli_stan) # Condition on example data bernoulli.data.json bern_fit = bernoulli_model.sample(data=bernoulli_data, seed=123) # Print a summary of the posterior sample bern_fit.summary()
INFO:cmdstanpy:found newer exe file, not recompiling INFO:cmdstanpy:CmdStan start procesing
chain 1 | | 00:00 Status
chain 2 | | 00:00 Status
chain 3 | | 00:00 Status
chain 4 | | 00:00 Status
INFO:cmdstanpy:CmdStan done processing.