Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
21 views
unlisted
ubuntu2204
Kernel: Python 3 (system-wide)

Bonus. Composing Kernels

In lecture and tutorial, we have introduced you to a few (commonly used) kernels:

  • Linear kernel: K(u,v)=uâ‹…vK(u, v) = u \cdot v.

  • Polynomial kernel (with degree dd): K(u,v)=(uâ‹…v)dK(u, v) = (u \cdot v)^d.

  • Gaussian Kernel: K(u,v)=e−∥u−v∥22σ2K(u, v) = e^{-\frac{\lVert u - v \rVert^2}{2 \sigma^2}}

These kernels work well but... isn't it cool to invent your own kernel?

Recall that kernels are essentially applying feature transformations (yet implicitly). Any function that satisfies K(u,v)=Ï•(u)â‹…Ï•(v)K(u, v) = \phi(u) \cdot \phi(v) would be a valid kernel, where Ï•\phi is the function mapping the original features to the transformed features. In this exercise, let's try to compose the existing kernels to create new kernels.

Your task: There are 3 tasks (+ a demo task) which walks you through on how you could create new kernels. Go through them one by one.

Submission: After working on Tasks 1 - 3, send me a screenshot of your implementation and diagram (for Task 3) before/during the tutorial (for bonus EXP)!

P.S. If no one solves all three tasks, I will still give out bonus EXP to those who solved at least 2. I believe Task 2 would be the hardest one.

from sklearn.svm import SVC from sklearn import datasets import numpy as np import matplotlib.pyplot as plt import math
# Load dataset iris = datasets.load_iris() X = iris.data[:, :2] y = iris.target # Adapted from https://www.kaggle.com/code/residentmario/kernels-and-support-vector-machine-regularization x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 h = (x_max / x_min)/100 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) X_test = np.c_[xx.ravel(), yy.ravel()] plt.figure(figsize=(5, 5)) plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, linewidth=1, edgecolor='black') plt.xlim((x_min, x_max)) plt.ylim((y_min, y_max)) plt.title('Dataset') plt.show()
Image in a Jupyter notebook

Task 0: Finding Transformed Features (Demo)

Given that the kernel trick is equivalent to performing feature transformations, we can actually train an equivalent SVM by making explicit feature transformations.

In lecture, we have shown that the kernel K(u,v)=(u⋅v)2K(u, v) = (u \cdot v)^2 is equivalent to applying the transformed features ϕ(u)=[u122u1u2u22]⊤\phi(u) = \begin{bmatrix} u_1^2 & \sqrt{2} u_1 u_2 & u_2^2 \end{bmatrix}^{\top}, since we have [u122u1u2u22]⊤⋅[v122v1v2v22]⊤=(u⋅v)2\begin{bmatrix} u_1^2 & \sqrt{2} u_1 u_2 & u_2^2 \end{bmatrix}^{\top} \cdot \begin{bmatrix} v_1^2 & \sqrt{2} v_1 v_2 & v_2^2 \end{bmatrix}^{\top} = (u \cdot v)^2

Let's try to verify this with code!

There are two code snippets below. The first one trains a SVM by generating the transformed features u12u_1^2, 2u1u2\sqrt{2} u_1 u_2 and u22u_2^2 for each input sample uu. The second one trains a SVM by applying the kernel function directly. Run them and verify if you obtained the same decision boundaries. It's also useful to study the implementation for Tasks 1 - 3.

def transform_quadratic(X): return np.stack([ X[:, 0] * X[:, 0], # (u_1)^2 math.sqrt(2) * X[:, 0] * X[:, 1], # sqrt(2) * u_1 * u_2 X[:, 1] * X[:, 1], # (u_2)^2 ], axis=-1) plt.subplots(1, 1, figsize=(5, 5)) X_transformed = transform_quadratic(X) svc = SVC(kernel="linear").fit(X_transformed, y) X_test_transformed = transform_quadratic(X_test) plt.subplot(1, 1, 1) Z_transformed = svc.predict(X_test_transformed) Z_transformed = Z_transformed.reshape(xx.shape) plt.contourf(xx, yy, Z_transformed, cmap=plt.cm.Paired, alpha=0.5) plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, linewidth=1, edgecolor='black') plt.title("Using Transformed Features") plt.show()
Image in a Jupyter notebook
def quadratic_kernel(U, V): return np.dot(U, V.T) ** 2 plt.subplots(1, 1, figsize=(5, 5)) svc = SVC(kernel=quadratic_kernel).fit(X, y) plt.subplot(1, 1, 1) Z_kernel_func = svc.predict(X_test) Z_kernel_func = Z_kernel_func.reshape(xx.shape) plt.contourf(xx, yy, Z_kernel_func, cmap=plt.cm.Paired, alpha=0.5) plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, linewidth=1, edgecolor='black') plt.title("Using the Kernel Function") assert np.all(Z_kernel_func == Z_transformed), "SVM gives a different classification" plt.show()
Image in a Jupyter notebook

Task 1: Adding Kernel Functions

One attempt is to add two kernel functions together, e.g. given that K1(u,v)=(uâ‹…v)1K_1(u, v) = (u \cdot v)^1 and K2(u,v)=(uâ‹…v)2K_2(u, v) = (u \cdot v)^2 are both valid kernel functions, we can create a new function K(u,v)=(uâ‹…v)1+(uâ‹…v)2K(u, v) = (u \cdot v)^1 + (u \cdot v)^2.

We can prove that the resulting function always a valid kernel function.

Try to prove this by working out the list of transformed features created by this composition. To demonstrate your understanding, implement your idea at transform_add that generates the list of transformed features.

If you implemented the function correctly, the resulting decision boundaries should be same as applying the kernel function.

def transform_add(X1, X2): """ Computes the transformed features equivalent to adding the two kernel functions K1 and K2 together. Note that K1 and K2 can have a different number of transformed features. Args: X1: A two-dimensional ndarray of shape (n, m). Contains all input samples, with their features transformed equivalent to the kernel K1. X2: A two-dimensional ndarray of shape (n, k). Contains all input samples, with their features transformed equivalent to the kernel K2. Same as the problem sets, your solution must not involve any iteration. """ return X1
# Test case. Adds the kernels (u â‹… v)^1 and (u â‹… v)^2 together. def my_add_kernel(U, V): return np.dot(U, V.T) ** 1 + np.dot(U, V.T) ** 2 def transform_K1(X): return X.copy() def transform_K2(X): return np.stack([ X[:, 0] * X[:, 0], math.sqrt(2) * X[:, 0] * X[:, 1], X[:, 1] * X[:, 1], ], axis=-1) plt.subplots(1, 2, figsize=(10, 5)) svc = SVC(kernel=my_add_kernel).fit(X, y) plt.subplot(1, 2, 1) Z_correct = svc.predict(X_test) Z_correct = Z_correct.reshape(xx.shape) plt.contourf(xx, yy, Z_correct, cmap=plt.cm.Paired, alpha=0.5) plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, linewidth=1, edgecolor='black') plt.title("Expected: Composing Kernels") X_transformed = transform_add(transform_K1(X), transform_K2(X)) svc = SVC(kernel="linear").fit(X_transformed, y) X_test_transformed = transform_add(transform_K1(X_test), transform_K2(X_test)) plt.subplot(1, 2, 2) Z_test = svc.predict(X_test_transformed) Z_test = Z_test.reshape(xx.shape) plt.contourf(xx, yy, Z_test, cmap=plt.cm.Paired, alpha=0.5) plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, linewidth=1, edgecolor='black') plt.title("Actual: Transformed Features") assert np.all(Z_correct == Z_test), "Your SVM gives a different classification" plt.show()
--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[8], line 36 33 plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, linewidth=1, edgecolor='black') 34 plt.title("Actual: Transformed Features") ---> 36 assert np.all(Z_correct == Z_test), "Your SVM gives a different classification" 38 plt.show() AssertionError: Your SVM gives a different classification
Image in a Jupyter notebook

Task 2: Multiplying Kernel Functions

Another attempt is to multiply two kernel functions together, e.g. given that K1(u,v)=(u⋅v)3K_1(u, v) = (u \cdot v)^3 and K2(u,v)=e−∥u−v∥22σ2K_2(u, v) = e^{-\frac{\lVert u - v \rVert^2}{2 \sigma^2}} are both valid kernel functions, we can create a new function K(u,v)=e−∥u−v∥22σ2(u⋅v)3K(u, v) = e^{-\frac{\lVert u - v \rVert^2}{2 \sigma^2}} (u \cdot v)^3.

We can also prove that the resulting function always a valid kernel function.

Try to prove this by working out the list of transformed features created by this composition. To demonstrate your understanding, implement your idea at transform_multiply that generates the list of transformed features.

If you implemented the function correctly, the resulting decision boundaries should be same as applying the kernel function.

def transform_multiply(X1, X2): """ Computes the transformed features equivalent to multiplying the two kernel functions K1 and K2 together. Note that K1 and K2 can have a different number of transformed features. Args: X1: A two-dimensional ndarray of shape (n, m). Contains all input samples, with their features transformed equivalent to the kernel K1. X2: A two-dimensional ndarray of shape (n, k). Contains all input samples, with their features transformed equivalent to the kernel K2. Same as the problem sets, your solution must not involve any iteration. """ return X1
# Test case. Multiplies the kernels (u â‹… v + 1)^1 and (u â‹… v)^2 together. def my_multiply_kernel(U, V): return ((np.dot(U, V.T) + 1) ** 1) * (np.dot(U, V.T) ** 2) def transform_K1(X): return np.hstack([ X, np.ones((X.shape[0], 1)) ]) def transform_K2(X): return np.stack([ X[:, 0] * X[:, 0], math.sqrt(2) * X[:, 0] * X[:, 1], X[:, 1] * X[:, 1], ], axis=-1) plt.subplots(1, 2, figsize=(10, 5)) svc = SVC(kernel=my_multiply_kernel).fit(X, y) plt.subplot(1, 2, 1) Z_correct = svc.predict(X_test) Z_correct = Z_correct.reshape(xx.shape) plt.contourf(xx, yy, Z_correct, cmap=plt.cm.Paired, alpha=0.5) plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, linewidth=1, edgecolor='black') plt.title("Expected: Composing Kernels") X_transformed = transform_multiply(transform_K1(X), transform_K2(X)) svc = SVC(kernel="linear").fit(X_transformed, y) X_test_transformed = transform_multiply(transform_K1(X_test), transform_K2(X_test)) plt.subplot(1, 2, 2) Z_test = svc.predict(X_test_transformed) Z_test = Z_test.reshape(xx.shape) plt.contourf(xx, yy, Z_test, cmap=plt.cm.Paired, alpha=0.5) plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, linewidth=1, edgecolor='black') plt.title("Actual: Transformed Features") assert np.all(Z_correct == Z_test), "Your SVM gives a different classification" plt.show()
--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[10], line 39 36 plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, linewidth=1, edgecolor='black') 37 plt.title("Actual: Transformed Features") ---> 39 assert np.all(Z_correct == Z_test), "Your SVM gives a different classification" 41 plt.show() AssertionError: Your SVM gives a different classification
Image in a Jupyter notebook

Task 3: Create your own kernel!

Based on these two transformations, create your own kernel function my_creative_kernel. Be as creative as possible!

Note: You are not required to work out the set of equivalent transformed features. Infinite-dimensional kernel functions would make this impossible.

def my_creative_kernel(U, V): """ Computes the kernel function K(u, v) over all pairs of data points. Args: U: A two-dimensional ndarray of shape (n_samples_1, n_features). V: A two-dimensional ndarray of shape (n_samples_2, n_features). Return: A two-dimensional ndarray of shape (n_samples_1, n_samples_2), where the (i, j)-th element should contain K(U_i, V_i). The linear kernel has been implemented as an example. You should replace it by your more creative kernel function. """ return np.dot(U, V.T)
# Test your kernel! plt.subplots(1, 1, figsize=(5, 5)) svc = SVC(kernel=my_creative_kernel).fit(X, y) plt.subplot(1, 1, 1) Z_correct = svc.predict(X_test) Z_correct = Z_correct.reshape(xx.shape) plt.contourf(xx, yy, Z_correct, cmap=plt.cm.Paired, alpha=0.5) plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, linewidth=1, edgecolor='black') plt.title("My Creative Kernel") plt.show()
Image in a Jupyter notebook