Jupyter notebook sheet01.ipynb

⁴³⁷ views

Kernel: Python 2 (SageMath)

Programming Sheet 1: Bayes Decision Theory (40 P)

In this exercise sheet, we will apply Bayes decision theory in the context of small two-dimensional problems. For this, we will make use of 3D plotting. We introduce below the basics for constructing these plots in Python/Matplotlib.

The function `numpy.meshgrid`

To plot two-dimensional functions, we first need to discretize the two-dimensional input space. One basic function for this purpose is numpy.meshgrid. The following code creates a discrete grid of the rectangular surface $[0,4] \times [0,3]$ . The function numpy.meshgrid takes the discretized intervals as input, and returns two arrays of size corresponding to the discretized surface (i.e. the grid) and containing the X and Y-coordinates respectively.

In [1]:

import numpy
X,Y = numpy.meshgrid([0,1,2,3,4],[0,1,2,3])
print(X)
print(Y)

Out[1]:

[[0 1 2 3 4]
 [0 1 2 3 4]
 [0 1 2 3 4]
 [0 1 2 3 4]]
[[0 0 0 0 0]
 [1 1 1 1 1]
 [2 2 2 2 2]
 [3 3 3 3 3]]

Note that we can iterate over the elements of the grid by zipping the two arrays X and Y containing each coordinate. The function numpy.flatten converts the 2D arrays to one-dimensional arrays, that can then be iterated element-wise.

In [2]:

print(list(zip(X.flatten(),Y.flatten())))

Out[2]:

[(0, 0), (1, 0), (2, 0), (3, 0), (4, 0), (0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (0, 2), (1, 2), (2, 2), (3, 2), (4, 2), (0, 3), (1, 3), (2, 3), (3, 3), (4, 3)]

3D-Plotting

To enable 3D-plotting, we first need to load some modules in addition to matplotlib:

In [3]:

import matplotlib
%matplotlib inline
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

As an example, we would like to plot the L2-norm function $f(x,y) = \sqrt{x^2 + y^2}$ on the subspace $x,y \in [-4,4]$ . First, we create a meshgrid with appropriate size:

In [4]:

R = numpy.arange(-4,4+1e-9,0.1)
X,Y = numpy.meshgrid(R,R)
print(X.shape,Y.shape)

Out[4]:

((81, 81), (81, 81))

Here, we have used a discretization with small increments of 0.1 in order to produce a plot with better resolution. The resulting meshgrid has size (81x81), that is, approximately 6400 points. The function $f$ needs to be evaluated at each of these points. This is achieved by applying element-wise operations on the arrays of the meshgrid. The norm at each point of the grid is therefore computed as:

In [5]:

F = (X**2+Y**2)**.5
print(F.shape)

Out[5]:

(81, 81)

The resulting function values are of same size as the meshgrid. Taking X,Y,F jointly results in a list of approximately 6400 triplets representing the x-, y-, and z-coordinates in the three-dimensional space where the function should be plotted. The 3d-plot can now be constructed easily by means of the function scatter of matplotlib.pyplot.

In [6]:

fig = plt.figure(figsize=(10,6))
ax = plt.axes(projection='3d')
ax.scatter(X,Y,F,s=1,alpha=0.5)

Out[6]:

<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x7fd460cf5ed0>

The parameter s and alpha control the size and the transparency of each data point. Other 3d plotting variants exist (e.g. surface plots), however, the scatter plot is the simplest approach at least conceptually. Having introduced how to easily plot 3D functions in Python, we can now analyze two-dimensional probability distributions with this same tool.

Exercise 1: Gaussian distributions (5+5+5 P)

Using the technique introduced above, we would like to plot a normal Gaussian probability distribution with mean vector $\mu = (0,0)$ , and covariance matrix $\Sigma = I$ also known as standard normal distribution. We consider the same discretization as above (i.e. a grid from -4 to 4 using step size 0.1). For two-dimensional input spaces, the standard normal distribution is given by: $p(x,y) = \frac{1}{2\pi}e^{-0.5 (x^2+y^2)}.$ This distribution sums to $1$ when integrated over $\mathbb{R}^2$ . However, it does not sum to $1$ when summing over the discretized space (i.e. the grid). Instead, we can work with a discretized Gaussian-like distribution: $P(x,y) = \frac1Z e^{-0.5 (x^2+y^2)} \qquad \text{with} \quad Z = \sum_{x,y} e^{-0.5 (x^2+y^2)}$ where the sum runs over the whole discretized space.

Compute the distribution $P(x,y)$ , and plot it.
Compute the conditional distribution $Q(x,y) = P((x,y) | \sqrt{x^2+y^2} \geq 1)$ , and plot it.
Marginalize the conditioned distribution $Q(x,y)$ over $y$ , and plot the resulting distribution $Q(x)$ .

In [7]:

import numpy as np
Z = np.exp(-0.5*(X**2+Y**2)).sum()

P = 1 / Z * np.exp(-0.5*(X**2+Y**2))

fig = plt.figure(figsize=(10,6))
ax = plt.axes(projection='3d')
ax.scatter(X,Y,P,s=1,alpha=0.5)

Out[7]:

<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x7fd463faa890>

In [8]:

conditionalMatrix = ((X**2 + Y**2) >= 1.0) + 0.0   # why the "+ 0" ?

In [9]:

(P * conditionalMatrix).sum()   #auskommentiert da für abgabe irrelevant es scheinbar noch ein übrig gebliebener test von dir war

Out[9]:

0.6115672959398345

In [10]:

Q = P * conditionalMatrix / (P * conditionalMatrix).sum()

In [11]:

fig = plt.figure(figsize=(10,6))
ax = plt.axes(projection='3d')
ax.scatter(X,Y,Q,s=1,alpha=0.5)

Out[11]:

<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x7fd463592990>

In [12]:

Q_marg = Q.sum(axis = 1)

fig = plt.figure(figsize=(10,6))
ax = plt.axes()
ax.scatter(X[0], Q_marg,s=5,alpha=0.5)

Out[12]:

<matplotlib.collections.PathCollection at 0x7fd46045a510>

Exercise 2: Bayesian Classification (5+5+5 P)

Let the two coordinates x and y be now representated as a two-dimensional vector $\boldsymbol{x}$ . We consider two classes $\omega_1$ and $\omega_2$ with data-generating Gaussian distributions $p(\boldsymbol{x}|\omega_1)$ and $p(\boldsymbol{x}|\omega_2)$ of mean vectors $\boldsymbol{\mu}_1 = (-0.5,-0.5) \quad \text{and} \quad \boldsymbol{\mu}_2 = (0.5,0.5)$ respectively, and same covariance matrix $\Sigma = \begin{pmatrix}1.0&0\\0&0.5\end{pmatrix}.$ Classes occur with probability $P(\omega_1) = 0.9$ and $P(\omega_2) = 0.1$ . Analysis tells us that in such scenario, the optimal decision boundary between the two classes should be linear. We would like to verify this computationally by applying Bayes decision theory on grid-like discretized distributions.

** Using the same grid as in Exercise 1, discretize the two data-generating distributions $p(\boldsymbol{x}|\omega_1)$ and $p(\boldsymbol{x}|\omega_2)$ (i.e. create discrete distributions $P(\boldsymbol{x}|\omega_1)$ and $P(\boldsymbol{x}|\omega_2)$ on the grid), and plot them with different colors.**
From these distributions, compute the total probability distribution $P(\boldsymbol{x}) = \sum_{c \in \{1,2\}} P(\boldsymbol{x} | \omega_c) \cdot P(\omega_c)$ , and plot it.
Compute and plot the class posterior probabilities $P(\omega_1|\boldsymbol{x})$ and $P(\omega_2|\boldsymbol{x})$ , and print the Bayes error rate for the discretized case.

In [13]:

from scipy.stats import multivariate_normal
cov = [1.0, 0.5]
distr1 = multivariate_normal(mean=[-0.5, -0.5], cov=cov)
distr2 = multivariate_normal(mean=[0.5, 0.5], cov=cov)

pos = np.dstack((X, Y));

P_xw1 = distr1.pdf(pos) / 100.0
P_xw2 = distr2.pdf(pos) / 100.0

#NegMean= multivariate_normal.pdf(pos,mean=[-0.5, -0.5], cov=cov)/100.0
#PosMean= multivariate_normal.pdf(pos,mean=[0.5, 0.5], cov=cov)/100.0

fig = plt.figure(figsize=(10,6))
ax = plt.axes(projection='3d')

ax.scatter(X,Y,P_xw1,s=1,alpha=0.5, color='blue')
ax.scatter(X,Y,P_xw2,s=1,alpha=0.5, color='red')
#ax.scatter(X,Y,NegMean,s=1,alpha=0.5, color='green')

Out[13]:

<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x7fd446fd7990>

In [14]:

P_x = P_xw1 * 0.9 + P_xw2 * 0.1

fig = plt.figure(figsize=(10,6))
ax = plt.axes(projection='3d')
ax.scatter(X,Y,P_x,s=1,alpha=0.5)

Out[14]:

<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x7fd446fa3790>

In [15]:

P_w1x = P_xw1 * 0.9 / P_x
P_w2x = P_xw2 * 0.1 / P_x

fig = plt.figure(figsize=(10,6))
ax = plt.axes(projection='3d')

ax.scatter(X,Y,P_w1x,s=1,alpha=0.5, color='blue')
ax.scatter(X,Y,P_w2x,s=1,alpha=0.5, color='red')

#ax.azim = -20
#ax.elev = 50

Out[15]:

<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x7fd446e7bdd0>

In [16]:

bayes_error = np.sum(np.minimum(P_w1x, P_w2x)*P_x)

print("Bayes error rate: %.3f"%bayes_error)

Out[16]:

Bayes error rate: 0.080

Exercise 3: Reducing the Variance (5+5 P)

Suppose that the data generating distribution for the second class changes to produce samples much closer to the mean. This variance reduction for the second class is implemented by keeping the first covariance the same (i.e. $\Sigma_1 = \Sigma$ ) and dividing the second covariance matrix by 4 (i.e. $\Sigma_2 = \Sigma/4$ ). For this new set of parameters, we can perform the same analysis as in Exercise 2.

Plot the new class posterior probabilities $P(\omega_1|\boldsymbol{x})$ and $P(\omega_2|\boldsymbol{x})$ associated to the new covariance matrices, and print the new Bayes error rate.

In [17]:

distr1 = multivariate_normal(mean=[-0.5, -0.5], cov=[1.0, 0.5])
distr2 = multivariate_normal(mean=[0.5, 0.5], cov=[0.25, 0.125])

pos = np.dstack((X, Y));

P_xw1 = distr1.pdf(pos) / 100.0
P_xw2 = distr2.pdf(pos) / 100.0

P_x = P_xw1 * 0.9 + P_xw2 * 0.1

P_w1x = P_xw1 * 0.9 / P_x
P_w2x = P_xw2 * 0.1 / P_x


fig = plt.figure(figsize=(10,6))
ax = plt.axes(projection='3d')

ax.scatter(X,Y,P_w1x,s=1,alpha=0.5, color='blue')
ax.scatter(X,Y,P_w2x,s=1,alpha=0.5, color='red')

Out[17]:

<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x7fd446b7f1d0>

In [18]:

bayes_error = np.sum(np.minimum(P_w1x, P_w2x)*P_x)
print("Bayes error rate: %.3f"%bayes_error)

Out[18]:

Bayes error rate: 0.073

Intuition tells us that by variance reduction and resulting concentration of generated data for class 2 in a smaller region of the input space, it should be easier to predict class 2 with certainty at this location. Paradoxally, in this new "dense" setting, we observe that class 2 does not reach full certainty anywhere in the input space, whereas it did in the previous exercise.

Explain this paradox.

[YOUR EXPLANATION HERE]

Programming Sheet 1: Bayes Decision Theory (40 P)

The function `numpy.meshgrid`

3D-Plotting

Exercise 1: Gaussian distributions (5+5+5 P)

Exercise 2: Bayesian Classification (5+5+5 P)

Exercise 3: Reducing the Variance (5+5 P)

Product

Resources

Company

Programming Sheet 1: Bayes Decision Theory (40 P)

The function numpy.meshgrid

3D-Plotting

Exercise 1: Gaussian distributions (5+5+5 P)

Exercise 2: Bayesian Classification (5+5+5 P)

Exercise 3: Reducing the Variance (5+5 P)

The function `numpy.meshgrid`