Path: blob/master/incubator/covariance-imputation.ipynb
411 views
Introduction
In this notebook, I would like to investigate the use of pairwise covariance matrices to impute data.
Simulated Data
First off, let's simulate data drawn from a multivariate normal. Three columns of data, columns A, B, and C, for which we know the ground-truth covariance matrix between all 3.
Now, let's simulate the case where a dropout mask is applied on 99% of the data.
Now, let's say I have a new sample for which I only have data from column 0 and 1. Can we combine this information in a mathematically principled fashion so as to recover measurement of column 2 with uncertainty?
By the fundamental rule of multivariate normals, if we have a bivariate Normal distribution:
Then if we know the value of , then follows a distribution:
where
and
Thanks to the magic of Python, we can encode this in a function. Given two columns of data, we can estimate and and the covariance matrix .
It works! We can use fully probabilistic methods that are mathematically principled to obtain estimates of unknown data, given that we know the joint distribution.