Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
AllenDowney
GitHub Repository: AllenDowney/bayesian-analysis-recipes
Path: blob/master/incubator/logistic-regression-amputation.ipynb
411 views
Kernel: bayesian
import pymc3 as pm import matplotlib.pyplot as plt import pandas as pd import numpy as np import theano.tensor as tt import theano %load_ext autoreload %autoreload 2 %matplotlib inline %config InlineBackend.figure_format = 'retina'
data = pd.read_csv("../datasets/antiseptic-amputation.csv", header=None) data.columns = ["subject", "year", "antiseptic", "limb", "outcome"] data.set_index("subject", inplace=True) # Data normalization data["year"] = data["year"] - data["year"].min() data = pm.floatX(data) data.head()

The logistic function is defined as

p=11+ekp = \frac{1}{1 + e^{-k}}

Here, the kk term refers to:

k=βnx1+β2x2+...+βnxnk = \beta_{n}x_{1} + \beta_{2}x_{2} + ... + \beta_{n}x_{n}

Therefore, we will write it in as such

import theano.tensor as tt # data = pm.floatX(data) def logit(x): return np.exp(x) / (1 + np.exp(x)) with pm.Model() as model: pm.glm.linear.GLM(x=data[["year", "antiseptic", "limb"]], y=data["outcome"])
with model: trace = pm.sample(draws=2000)
pm.traceplot(trace)

Posterior predictive check.

ppc = pm.sample_ppc(trace, model=model, samples=500)
ppc["y"].mean(axis=0)
preds = np.rint(ppc["y"].mean(axis=0)).astype("int")
from scikitplot.plotters import plot_confusion_matrix plot_confusion_matrix(preds, data["outcome"])
from sklearn.metrics import accuracy_score accuracy_score(preds, data["outcome"])
pm.forestplot(trace,)

There's a baseline probability of survival, and a little bit of an error term that governs whether a limb will survive or not. Beyond that, antiseptics have the biggest effect on limb survival.