Path: blob/master/incubator/gp-numpy.ipynb
411 views
A Fundamental Property of Gaussians
A multivariate Gaussian is nothing more than a generalization of the univariate Gaussian.
We parameterize univariate Gaussians with a and , where and are scalars.
A bivariate Gaussian is two univariate Gaussians that may also share a relationship to one another. We can jointly model both Gaussians by modelling not just how they vary independently, but also how they vary with one another.
One of the fundamental properties of Gaussians is that if you have a Multivariate Gaussian (e.g. a joint distribution of 2 or more Gaussian random variables), if we condition on any subset of Gaussians, the joint distribution of the rest of the Gaussians can be found analytically. There's a formula, and it's expressed in code below.
Go ahead and play with the slider below.
2D s
Take this into higher dimensions. Let's not compare two scalar s, but now do two s that are each a vector of 2 dimensions.
Implementing GP Prior
When we use a GP, we're essentially modelling the outputs as being described by a joint Gaussian distribution.
We would like to be able to specify the covariance matrix as a function of the distances between the inputs - regardless of whether the inputs are 1-D, 2-D, or more. That is the key to generalizing from 1D examples to the 2D examples commonly shown.
Draw from prior.
We can rewrite extend this code to apply in two dimensions. Let's say that our data lived on a grid, rather than on a single dimension. A periodic function is applied on a 2D grid.
Prior
Let's sample a prior from a 2D plane.
We'll simulate sampling 5 starting points.
In the plot sabove, red dots mark where we have sampled points on the 2D grid.
The left plot shows the size of the 95% prediction interval at each point on the grid. We can see that we have the smallest uncertainty where we have sampled.
The middle plot shows ground truth, and the values where we have sampled data.
The right plot shows the value of the mean where we have sampled. It is evident that where we do not have better data, the function evaluations default to the mean function.
Parting Thoughts
The key ingredient of a GP: A Kernel that can model "distance" of some kind between every pair of inputs. Thus, it isn't the number of dimensions that is limiting; it is the number of data points that have been sampled that is limiting! (Inversion of the matrix only depends on the data that we are conditioning on, and that is of order .)