CoCalc -- logistic-regression-amputation.ipynb

GitHub Repository: AllenDowney/bayesian-analysis-recipes
Path: blob/master/incubator/logistic-regression-amputation.ipynb
⁴¹¹ views

Kernel: bayesian

In [ ]:

import pymc3 as pm
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import theano.tensor as tt
import theano

%load_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

In [ ]:

data = pd.read_csv("../datasets/antiseptic-amputation.csv", header=None)
data.columns = ["subject", "year", "antiseptic", "limb", "outcome"]
data.set_index("subject", inplace=True)

# Data normalization
data["year"] = data["year"] - data["year"].min()
data = pm.floatX(data)
data.head()

The logistic function is defined as

p = \frac{1}{1 + e^{-k}}

Here, the $k$ term refers to:

k = \beta_{n}x_{1} + \beta_{2}x_{2} + ... + \beta_{n}x_{n}

Therefore, we will write it in as such

In [ ]:

import theano.tensor as tt

# data = pm.floatX(data)


def logit(x):
    return np.exp(x) / (1 + np.exp(x))


with pm.Model() as model:
    pm.glm.linear.GLM(x=data[["year", "antiseptic", "limb"]], y=data["outcome"])

In [ ]:

with model:
    trace = pm.sample(draws=2000)

In [ ]:

pm.traceplot(trace)

Posterior predictive check.

In [ ]:

ppc = pm.sample_ppc(trace, model=model, samples=500)

In [ ]:

ppc["y"].mean(axis=0)

In [ ]:

preds = np.rint(ppc["y"].mean(axis=0)).astype("int")

In [ ]:

from scikitplot.plotters import plot_confusion_matrix

plot_confusion_matrix(preds, data["outcome"])

In [ ]:

from sklearn.metrics import accuracy_score

accuracy_score(preds, data["outcome"])

In [ ]:

pm.forestplot(trace,)

There's a baseline probability of survival, and a little bit of an error term that governs whether a limb will survive or not. Beyond that, antiseptics have the biggest effect on limb survival.

In [ ]:

Product

Resources

Company