CoCalc -- README.md

GitHub Repository: debakarr/machinelearning
Path: blob/master/README.md
¹⁰⁰¹ views

Machine Learning A-Z™: Hands-On Python & R In Data Science

Materials
My Notes [Jupyter Notebook]
Important links

NOTE: If you are going to dowload or fork the zip version of this repository then please be aware that the size of the repository is 520.3 MB.

Part 1 - Data Preprocessing
- [Python] Data Preprocessing
- [R] Data Preprocessing
- Steps involved: Import the dataset -> Taking care of missing data -> Encoding categorical data -> Splitting the dataset into Trainingset and Test set -> Feature Scaling

Part 2 - Regression
- Simple Linear Regression
  - [Python] Simple Linear Regression
  - [R] Simple Linear Regression
  - Steps involved: Data preprocessing -> Fitting Simple Linear Regression to the Training Set -> Predicting the Test set result -> Visualising the Training set results -> Visualising the Test set results
- Multiple Linear Regression
  - [Python] Multiple Linear Regression
  - [R] Multiple Linear Regression
  - Steps involved: Data prepocessing [Encoding categorical data(if any) -> Avoid dummy variable trap(can be done using Python and R library)] -> Fitting Multiple Linear Regression to the training set -> Predicting the test set reults
  - For steps needed for Backward Elimination please refer to the pdf
- Polynomial Regression
  - [Python] Polynomial Regression
  - [R] Polynomial Regression
  - Steps involved: Data preprocessing -> Fitting Polynomial Regression to the dataset -> Visualising the Polynomial Regression results -> Adjust degree -> Get a more continuous curve -> Predicting a new result with Polynomial Regression
- Support Vector Regression
  - [Python] Support Vector Regression (SVR)
  - [R] Support Vector Regression (SVR)
  - Steps involved: Data preprocessing (for python we need to do Feature Scaling) -> Fitting the SVR Model to the dataset-> Predicting a new result -> Visualising the SVR Regression results
- Decision Tree Regression
  - [Python] Decision Tree Regression
  - [R] Decision Tree Regression
  - Steps involved: Data preprocessing -> Fitting the Decision Tree Regression Model to the dataset -> Predicting a new result -> Visualising the Decision Tree Regression results
- Random Forest Regression
  - [Python] Random Forest Regression
  - [R] Random Forest Regression
  - Steps involved: Data preprocessing -> Fitting the Random Forest Regression Model to the dataset (tweak the NTree parameter)-> Predicting a new result -> Visualising the Random Forest Regression results

Part 3 - Classification
- Logistic Regression
  - [Python] Logistic Regression
  - [R] Logistic Regression
  - Steps involved: Data preprocessing -> Fitting Logistic Regression to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
- K-Nearest Neighbors
  - [Python] K-Nearest Neighbors
  - [R] K-Nearest Neighbors
  - Steps involved: Data preprocessing -> Fitting K-Nearest Neighbor Classifier to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
- Support Vector Machine
  - [Python] Support Vector Machine
  - [R] Support Vector Machine
  - Steps involved: Data preprocessing -> Fitting Support Vector Machine Classifier to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
- Kernel SVM
  - [Python] Kernel SVM
  - [R] Kernel SVM
  - Steps involved: Data preprocessing -> Fitting Kernel SVM to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
- Naive Bayes
  - [Python] Naive Bayes
  - [R] Naive Bayes
  - Steps involved: Data preprocessing [Here Encoding the target feature as factor is compulsory in R] -> Fitting Naive Bayes to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
- Decision Tree
  - [Python] Decision Tree
  - [R] Decision Tree
  - Steps involved: Data preprocessing [Here we don't actually need Feature Scaling as decison tree classification does not depends on Euclidean distance] -> Fitting Decision Tree to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results [ -> Visualize the Decision Tree (in R)]
- Random Forest
  - [Python] Random Forest
  - [R] Random Forest
  - Steps involved: Data preprocessing [Here we don't actually need Feature Scaling as random forest classification does not depends on Euclidean distance] -> Fitting Random Forest to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results

Part 4 - Clustering
- K-Means Clustering
  - [Python] K-Means Clustering
  - [R] K-Means Clustering
  - Steps involved: Data preprocessing -> Using the elbow method to find the optimal number of clusters -> Applying K-Means to the dataset -> Visualizing the clusters -> Analyse
- Hierarchical Clustering
  - [Python] Hierarchical Clustering
  - [R] Hierarchical Clustering
  - Steps involved: Data preprocessing -> Using dendrogram to find the optimal number of clusters -> Applying Agglomerative Hierarchical Clustering to the dataset -> Visualizing the clusters -> Analyse

Part 5 - Association Rule Learning
- Apriori
  - [Python] Apriori
  - [R] Apriori
  - Steps involved: Data preprocessing -> Training Apriori on the dataset -> Visualization of the result
- Eclat
  - [R] Eclat
  - Steps involved: Data preprocessing -> Training Eclat on the dataset -> Visualization of the result

Part 6 - Reinforcement Learning
- Upper Confidence Bound
  - [Python] Upper Confidence Bound
  - [R] Upper Confidence Bound
  - Steps involved: Data preprocessing -> Implementing the Upper Confidence Bound -> Visualization of the result

Part 7 - Natural Language Processing
- [Python] Natural Language Processing
- [R] Natural Language Processing
- Steps involved: Data preprocessing -> Cleaning the text -> Creating the Bag of Words model -> Splitting the dataset into the Training set and Test set -> Fitting the Training set in some classification model -> Predicting the Test set results -> Making Confusion Matrix -> Analyse

Part 8 - Deep Learning
- Artificial Neural Network
  - [Python] Artificial Neural Network
  - [R] Artificial Neural Network
  - Steps involved: Data preprocessing -> [In Python: Initialization of ANN -> Adding the input layer and the first hidden layer -> Adding more hidden layer(s) inbetween(optional) -> Adding the output layer -> Compiling the ANN] -> Fiting ANN to the Training set [used keras for Python and h2o for R] -> Predicting the Test set results -> Making the confussion Matrix -> Calculating Accuracy -> Analyse and Improve if possible
- Convolutional Neural Network
  - [Python] Convolutional Neural Network
  - Steps involved: Data preprocessing [It is done manually, please refer to notebook for more information] -> Importing the Keras libraries and packages -> Initialising the CNN -> Convolution -> Pooling -> Adding a second convolutional layer followed by pooling(to improve accuracy) -> Flattening -> Full connection -> Compiling the CNN -> Fitting the CNN to the images

Part 9 - Dimension Reduction
- Principle Component Analysis
  - [Python] Principle Component Analysis
  - [R] Principle Component Analysis
  - Steps involved: Data preprocessing -> Applying PCA -> Fitting classifier to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
- Linear Discriminant Analysis
  - [Python] Linear Discriminant Analysis
  - [R] Linear Discriminant Analysis
  - Steps involved: Data preprocessing -> Applying LDA -> Fitting classifier to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
- Kernel PCA
  - [Python] Kernel PCA
  - [R] Kernel PCA
  - Steps involved: Data preprocessing -> Applying Kernel PCA -> Fitting classifier to the Training Set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results

Part 10 - Model Selection And Boosting
- k-Fold Cross Validation
  - [Python] k-Fold Cross Validation
  - [R] k-Fold Cross Validation
  - Steps involved: Data preprocessing -> Fitting Kernel SVM to the Training Set [Can use some other method] -> Predicting the Test set result -> Applying k-Fold Cross Validation -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
- Grid Search
  - [Python] Grid Search
  - [R] Grid Search
  - Steps involved: Data preprocessing -> Applying Grid Search to find the best model and the best parameters -> Fitting Kernel SVM to the Training Set with best parameters [Can use some other method] -> Predicting the Test set result -> Applying k-Fold Cross Validation -> Making and analysing the Confusion Matrix -> Visualising the Training set results -> Visualising the Test set results
- XGBoost
  - [Python] XGBoost
  - [R] XGBoost
  - Steps involved: Data preprocessing -> Fitting XGBoost to the training set -> Predicting the Test set result -> Making and analysing the Confusion Matrix -> Applying k-Fold Cross Validation [get Accuracy and Standard Deviation] -> Applying Grid Search to find the best model and the best parameters (Optional)

Important Links

Product

Resources

Company