Path: blob/master/incubator/survival-analysis.ipynb
411 views
Introduction
In this notebook, I want to reproduce (and more importantly, annotate) what Austin Rochford did in his Bayesian Survival Analysis notebook on his own blog.
Setup
In this dataset, we will analyze breast cancer patients who have undergone masectomy.
First off, we load in the dataset.
Inspecting what it looks like, it should be:
Each row is one patient.
For each patient, we record:
Time elapsed since masectomy (
time
column)Whether they have died (
event
column)Whether a metastasis has occcurred (
metastasized
column)
In order to get the data into shape for computing, we will need to do some data preprocessing.
Convert
event
to 1/0, where1
maps toTrue
.Convert
metastasized
to 1/0, where1
maps toyes
.
From Austin's blog post:
A suitable prior on is less obvious. We choose a semiparametric prior, where is a piecewise constant function. This prior requires us to partition the time range in question into intervals with endpoints . With this partition, if . With constrained to have this form, all we need to do is choose priors for the values . We use independent vague priors . For our mastectomy example, we make each interval three months long.
Aha! This matrix tells us which patients have died and when. Each row is one patient, each column is a time period, and a 1
(white) indicates that a patient died in that time period.
This heatmap tells us the time of exposure of a patient. Exposure is defined by whether they are exposed to a risk of dying or not. A patient is at risk of exposure if they have not died, up till the time they die (the non-censored patients) or the time that they have survived (the censored patients).
NUTS sampling of both of these models has been really slow. I'm not quite sure how to debug this.