Path: blob/master/Week 8/Programming Assignment - 7/ex7/ex7_pca.m
863 views
%% Machine Learning Online Class1% Exercise 7 | Principle Component Analysis and K-Means Clustering2%3% Instructions4% ------------5%6% This file contains code that helps you get started on the7% exercise. You will need to complete the following functions:8%9% pca.m10% projectData.m11% recoverData.m12% computeCentroids.m13% findClosestCentroids.m14% kMeansInitCentroids.m15%16% For this exercise, you will not need to change any code in this file,17% or any other files other than those mentioned above.18%1920%% Initialization21clear ; close all; clc2223%% ================== Part 1: Load Example Dataset ===================24% We start this exercise by using a small dataset that is easily to25% visualize26%27fprintf('Visualizing example dataset for PCA.\n\n');2829% The following command loads the dataset. You should now have the30% variable X in your environment31load ('ex7data1.mat');3233% Visualize the example dataset34plot(X(:, 1), X(:, 2), 'bo');35axis([0.5 6.5 2 8]); axis square;3637fprintf('Program paused. Press enter to continue.\n');38pause;394041%% =============== Part 2: Principal Component Analysis ===============42% You should now implement PCA, a dimension reduction technique. You43% should complete the code in pca.m44%45fprintf('\nRunning PCA on example dataset.\n\n');4647% Before running PCA, it is important to first normalize X48[X_norm, mu, sigma] = featureNormalize(X);4950% Run PCA51[U, S] = pca(X_norm);5253% Compute mu, the mean of the each feature5455% Draw the eigenvectors centered at mean of data. These lines show the56% directions of maximum variations in the dataset.57hold on;58drawLine(mu, mu + 1.5 * S(1,1) * U(:,1)', '-k', 'LineWidth', 2);59drawLine(mu, mu + 1.5 * S(2,2) * U(:,2)', '-k', 'LineWidth', 2);60hold off;6162fprintf('Top eigenvector: \n');63fprintf(' U(:,1) = %f %f \n', U(1,1), U(2,1));64fprintf('\n(you should expect to see -0.707107 -0.707107)\n');6566fprintf('Program paused. Press enter to continue.\n');67pause;686970%% =================== Part 3: Dimension Reduction ===================71% You should now implement the projection step to map the data onto the72% first k eigenvectors. The code will then plot the data in this reduced73% dimensional space. This will show you what the data looks like when74% using only the corresponding eigenvectors to reconstruct it.75%76% You should complete the code in projectData.m77%78fprintf('\nDimension reduction on example dataset.\n\n');7980% Plot the normalized dataset (returned from pca)81plot(X_norm(:, 1), X_norm(:, 2), 'bo');82axis([-4 3 -4 3]); axis square8384% Project the data onto K = 1 dimension85K = 1;86Z = projectData(X_norm, U, K);87fprintf('Projection of the first example: %f\n', Z(1));88fprintf('\n(this value should be about 1.481274)\n\n');8990X_rec = recoverData(Z, U, K);91fprintf('Approximation of the first example: %f %f\n', X_rec(1, 1), X_rec(1, 2));92fprintf('\n(this value should be about -1.047419 -1.047419)\n\n');9394% Draw lines connecting the projected points to the original points95hold on;96plot(X_rec(:, 1), X_rec(:, 2), 'ro');97for i = 1:size(X_norm, 1)98drawLine(X_norm(i,:), X_rec(i,:), '--k', 'LineWidth', 1);99end100hold off101102fprintf('Program paused. Press enter to continue.\n');103pause;104105%% =============== Part 4: Loading and Visualizing Face Data =============106% We start the exercise by first loading and visualizing the dataset.107% The following code will load the dataset into your environment108%109fprintf('\nLoading face dataset.\n\n');110111% Load Face dataset112load ('ex7faces.mat')113114% Display the first 100 faces in the dataset115displayData(X(1:100, :));116117fprintf('Program paused. Press enter to continue.\n');118pause;119120%% =========== Part 5: PCA on Face Data: Eigenfaces ===================121% Run PCA and visualize the eigenvectors which are in this case eigenfaces122% We display the first 36 eigenfaces.123%124fprintf(['\nRunning PCA on face dataset.\n' ...125'(this might take a minute or two ...)\n\n']);126127% Before running PCA, it is important to first normalize X by subtracting128% the mean value from each feature129[X_norm, mu, sigma] = featureNormalize(X);130131% Run PCA132[U, S] = pca(X_norm);133134% Visualize the top 36 eigenvectors found135displayData(U(:, 1:36)');136137fprintf('Program paused. Press enter to continue.\n');138pause;139140141%% ============= Part 6: Dimension Reduction for Faces =================142% Project images to the eigen space using the top k eigenvectors143% If you are applying a machine learning algorithm144fprintf('\nDimension reduction for face dataset.\n\n');145146K = 100;147Z = projectData(X_norm, U, K);148149fprintf('The projected data Z has a size of: ')150fprintf('%d ', size(Z));151152fprintf('\n\nProgram paused. Press enter to continue.\n');153pause;154155%% ==== Part 7: Visualization of Faces after PCA Dimension Reduction ====156% Project images to the eigen space using the top K eigen vectors and157% visualize only using those K dimensions158% Compare to the original input, which is also displayed159160fprintf('\nVisualizing the projected (reduced dimension) faces.\n\n');161162K = 100;163X_rec = recoverData(Z, U, K);164165% Display normalized data166subplot(1, 2, 1);167displayData(X_norm(1:100,:));168title('Original faces');169axis square;170171% Display reconstructed data from only k eigenfaces172subplot(1, 2, 2);173displayData(X_rec(1:100,:));174title('Recovered faces');175axis square;176177fprintf('Program paused. Press enter to continue.\n');178pause;179180181%% === Part 8(a): Optional (ungraded) Exercise: PCA for Visualization ===182% One useful application of PCA is to use it to visualize high-dimensional183% data. In the last K-Means exercise you ran K-Means on 3-dimensional184% pixel colors of an image. We first visualize this output in 3D, and then185% apply PCA to obtain a visualization in 2D.186187close all; close all; clc188189% Reload the image from the previous exercise and run K-Means on it190% For this to work, you need to complete the K-Means assignment first191A = double(imread('bird_small.png'));192193% If imread does not work for you, you can try instead194% load ('bird_small.mat');195196A = A / 255;197img_size = size(A);198X = reshape(A, img_size(1) * img_size(2), 3);199K = 16;200max_iters = 10;201initial_centroids = kMeansInitCentroids(X, K);202[centroids, idx] = runkMeans(X, initial_centroids, max_iters);203204% Sample 1000 random indexes (since working with all the data is205% too expensive. If you have a fast computer, you may increase this.206sel = floor(rand(1000, 1) * size(X, 1)) + 1;207208% Setup Color Palette209palette = hsv(K);210colors = palette(idx(sel), :);211212% Visualize the data and centroid memberships in 3D213figure;214scatter3(X(sel, 1), X(sel, 2), X(sel, 3), 10, colors);215title('Pixel dataset plotted in 3D. Color shows centroid memberships');216fprintf('Program paused. Press enter to continue.\n');217pause;218219%% === Part 8(b): Optional (ungraded) Exercise: PCA for Visualization ===220% Use PCA to project this cloud to 2D for visualization221222% Subtract the mean to use PCA223[X_norm, mu, sigma] = featureNormalize(X);224225% PCA and project the data to 2D226[U, S] = pca(X_norm);227Z = projectData(X_norm, U, 2);228229% Plot in 2D230figure;231plotDataPoints(Z(sel, :), idx(sel), K);232title('Pixel dataset plotted in 2D, using PCA for dimensionality reduction');233fprintf('Program paused. Press enter to continue.\n');234pause;235236237