Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/ML/Notebook/Introduction to Ensembling methods.ipynb
3087 views
Kernel: Python 3

Ensembling Methods

image.png

Ensemble is a group producing a single effect.

  • For example : If I want to purchase a house ,there can be different websites(Quicker,OLX,Housing etc) I can refer to , or I can consult from agents ,or I can refer to a friend.

Ensembling refers to making decisions by comparing or looking into features of different models and then combining the decisions from multiple models to take best decision which increases the overall performance.

Basic Ensembelling Techniques

  • Max Voting

The max voting method is generally used for classification problems. In this technique, multiple models are used to make predictions for each data point. The predictions by each model are considered as a ‘vote’. The predictions which we get from the majority of the models are used as the final prediction.

image.png

Sample Algo

from sklearn.ensemble import VotingClassifier model1 = LogisticRegression(random_state=1) model2 = tree.DecisionTreeClassifier(random_state=1) model = VotingClassifier(estimators=[('lr', model1), ('dt', model2)], voting='hard') model.fit(x_train,y_train) model.score(x_test,y_test)
  • Averaging

In this method, we take an average of predictions from all the models and use it to make the final prediction. Averaging can be used for making predictions in regression problems or while calculating probabilities for classification problems.

Sample Algo

model1 = tree.DecisionTreeClassifier() model2 = KNeighborsClassifier() model3= LogisticRegression() model1.fit(x_train,y_train) model2.fit(x_train,y_train) model3.fit(x_train,y_train) pred1=model1.predict_proba(x_test) pred2=model2.predict_proba(x_test) pred3=model3.predict_proba(x_test) finalpred=(pred1+pred2+pred3)/3
  • Weighted Average

This is an extension of the averaging method. All models are assigned different weights defining the importance of each model for prediction.

image.png

Sample Algo

model1 = tree.DecisionTreeClassifier() model2 = KNeighborsClassifier() model3= LogisticRegression() model1.fit(x_train,y_train) model2.fit(x_train,y_train) model3.fit(x_train,y_train) pred1=model1.predict_proba(x_test) pred2=model2.predict_proba(x_test) pred3=model3.predict_proba(x_test) finalpred=(pred1*0.3+pred2*0.3+pred3*0.4)

Advanced Esembling methods

Bagging

  • In this method we combine the results of multiple models to get a generalized result.

  • Bagging is short for bootstrap aggregation, so named because it takes a number of samples from the dataset, with each sample set being regarded as a bootstrap sample.

  • Operates via equal weighting of models and the results of these bootstrap samples are then aggregated.

Key Features

- Settles on result using majority voting - Employs multiple instances of same classifier for one dataset - Builds models of smaller datasets by sampling with replacement - Works best when classifier is unstable (decision trees, for example), as this instability creates models of differing accuracy and results to draw majority from - Bagging can hurt stable model by introducing artificial variability from which to draw inaccurate conclusions

OOB Error Estimation

Out-of-bag (OOB) error, also called out-of-bag estimate, is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating (bagging) to sub-sample data samples used for training.

Bagging algorithms:

- Bagging meta-estimator - Random forest

Boosting

  • This method is quite same as Bagging but the only difference is here that boosting assigns varying weights to classifiers, and derives its ultimate result based on weighted voting.

  • Operates via weighted voting

Key Features

- Algorithm proceeds iteratively; new models are influenced by previous ones - New models become experts for instances classified incorrectly by earlier models - Can be used without weights by using resampling, with probability determined by weights - Works well if classifiers & weak learners are not too complex - AdaBoost (Adaptive Boosting) is a popular boosting algorithm - LogitBoost (derived from AdaBoost) is another, which uses additive logistic regression, and handles multi-class problems

Boosting algorithms:

- AdaBoost - GBM(Gradient Boosting) - XGBM - Light GBM - CatBoost

Stacking

  • This method trains multiple single classifiers, as opposed to various incarnations of the same learner. While bagging and boosting would use numerous models built using various instances of the same classification algorithm (eg. decision tree), stacking builds its models using different classification algorithms (perhaps decision trees, logistic regression, or some other combination).

  • A combiner algorithm is then trained to make ultimate predictions using the predictions of other algorithms. This combiner can be any ensemble technique, but logistic regression is often found to be an adequate and simple algorithm to perform this combining.

  • Along with classification, stacking can also be employed in unsupervised learning tasks such as density estimation.

Key Features

- Trains multiple learners (as opposed to bagging/boosting which train a single learner) - Each learner uses a subset of data - A "combiner" is trained on a validation segment - Stacking uses a meta learner (as opposed to bagging/boosting which use voting schemes) - Difficult to analyze theoretically ("black magic") - Level-1 meta learner - Level-0 base classifiers - Can also be used for numeric prediction (regression)
#importing packages %matplotlib inline import scipy.stats as stats import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns plt.style.use('ggplot')