Machine Learning A-Z™: Hands-On Python & R In Data Science
Materials
My Notes [Jupyter Notebook]
NOTE: If you are going to dowload or fork the zip version of this repository then please be aware that the size of the repository is 520.3 MB.
Part 1 - Data Preprocessing
Steps involved: Import the dataset -> Taking care of missing data -> Encoding categorical data -> Splitting the dataset into Trainingset and Test set -> Feature Scaling
Part 2 - Regression
Simple Linear Regression
Steps involved: Data preprocessing -> Fitting Simple Linear Regression to the Training Set -> Predicting the Test set result -> Visualising the Training set results -> Visualising the Test set results
Multiple Linear Regression
Steps involved: Data prepocessing [Encoding categorical data(if any) -> Avoid dummy variable trap(can be done using Python and R library)] -> Fitting Multiple Linear Regression to the training set -> Predicting the test set reults
For steps needed for Backward Elimination please refer to the pdf
Polynomial Regression
Steps involved: Data preprocessing -> Fitting Polynomial Regression to the dataset -> Visualising the Polynomial Regression results -> Adjust degree -> Get a more continuous curve -> Predicting a new result with Polynomial Regression
Support Vector Regression
Steps involved: Data preprocessing (for python we need to do Feature Scaling) -> Fitting the SVR Model to the dataset-> Predicting a new result -> Visualising the SVR Regression results
Decision Tree Regression
Steps involved: Data preprocessing -> Fitting the Decision Tree Regression Model to the dataset -> Predicting a new result -> Visualising the Decision Tree Regression results
Random Forest Regression
Steps involved: Data preprocessing -> Fitting the Random Forest Regression Model to the dataset (tweak the NTree parameter)-> Predicting a new result -> Visualising the Random Forest Regression results
Part 3 - Classification
Logistic Regression
Steps involved: Data preprocessing -> Fitting Logistic Regression to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
K-Nearest Neighbors
Steps involved: Data preprocessing -> Fitting K-Nearest Neighbor Classifier to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
Support Vector Machine
Steps involved: Data preprocessing -> Fitting Support Vector Machine Classifier to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
Kernel SVM
Steps involved: Data preprocessing -> Fitting Kernel SVM to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
Naive Bayes
Steps involved: Data preprocessing [Here Encoding the target feature as factor is compulsory in R] -> Fitting Naive Bayes to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
Decision Tree
Steps involved: Data preprocessing [Here we don't actually need Feature Scaling as decison tree classification does not depends on Euclidean distance] -> Fitting Decision Tree to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results [ -> Visualize the Decision Tree (in R)]
Random Forest
Steps involved: Data preprocessing [Here we don't actually need Feature Scaling as random forest classification does not depends on Euclidean distance] -> Fitting Random Forest to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
Part 4 - Clustering
K-Means Clustering
Steps involved: Data preprocessing -> Using the elbow method to find the optimal number of clusters -> Applying K-Means to the dataset -> Visualizing the clusters -> Analyse
Hierarchical Clustering
Steps involved: Data preprocessing -> Using dendrogram to find the optimal number of clusters -> Applying Agglomerative Hierarchical Clustering to the dataset -> Visualizing the clusters -> Analyse
Part 5 - Association Rule Learning
Apriori
Steps involved: Data preprocessing -> Training Apriori on the dataset -> Visualization of the result
Eclat
Steps involved: Data preprocessing -> Training Eclat on the dataset -> Visualization of the result
Part 6 - Reinforcement Learning
Upper Confidence Bound
Steps involved: Data preprocessing -> Implementing the Upper Confidence Bound -> Visualization of the result
Part 7 - Natural Language Processing
Steps involved: Data preprocessing -> Cleaning the text -> Creating the Bag of Words model -> Splitting the dataset into the Training set and Test set -> Fitting the Training set in some classification model -> Predicting the Test set results -> Making Confusion Matrix -> Analyse
Part 8 - Deep Learning
Artificial Neural Network
Steps involved: Data preprocessing -> [In Python: Initialization of ANN -> Adding the input layer and the first hidden layer -> Adding more hidden layer(s) inbetween(optional) -> Adding the output layer -> Compiling the ANN] -> Fiting ANN to the Training set [used keras for Python and h2o for R] -> Predicting the Test set results -> Making the confussion Matrix -> Calculating Accuracy -> Analyse and Improve if possible
Convolutional Neural Network
Steps involved: Data preprocessing [It is done manually, please refer to notebook for more information] -> Importing the Keras libraries and packages -> Initialising the CNN -> Convolution -> Pooling -> Adding a second convolutional layer followed by pooling(to improve accuracy) -> Flattening -> Full connection -> Compiling the CNN -> Fitting the CNN to the images
Part 9 - Dimension Reduction
Principle Component Analysis
Steps involved: Data preprocessing -> Applying PCA -> Fitting classifier to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
Linear Discriminant Analysis
Steps involved: Data preprocessing -> Applying LDA -> Fitting classifier to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
Kernel PCA
Steps involved: Data preprocessing -> Applying Kernel PCA -> Fitting classifier to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
Part 10 - Model Selection And Boosting
k-Fold Cross Validation
Steps involved: Data preprocessing -> Fitting Kernel SVM to the Training Set [Can use some other method] -> Predicting the Test set result -> Applying k-Fold Cross Validation -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
Grid Search
Steps involved: Data preprocessing -> Applying Grid Search to find the best model and the best parameters -> Fitting Kernel SVM to the Training Set with best parameters [Can use some other method] -> Predicting the Test set result -> Applying k-Fold Cross Validation -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
XGBoost
Steps involved: Data preprocessing -> Fitting XGBoost to the training set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Applying k-Fold Cross Validation [get Accuracy and Standard Deviation] -> Applying Grid Search to find the best model and the best parameters (Optional)