Path: blob/master/ML Classification using Python/Random Forest and Ensemble Learning.ipynb
4732 views
Ensemble Learning and Random Forest
Ensemble Learning combines multiple models to improve predictive performance. Random Forest is an ensemble method that builds multiple decision trees and aggregates their predictions. 
Multiple Base Models (e.g., Decision Trees, SVMs) are trained independently.
Each model makes its own prediction.
An Aggregator combines these predictions (via voting, averaging, etc.).
The result is a Final Prediction, which is generally more accurate and robust than any single model.
Ensemble learning methods are broadly categorized into three main types, each with its own approach to combining multiple models:
Bagging (Bootstrap Aggregating)
Concept: Train multiple models (usually decision trees) on different random subsets of the training data (with replacement). Goal: Reduce variance and prevent overfitting. Example Algorithms:
Random Forest (most popular bagging method)
How it works:
Each model votes, and the majority vote becomes the final prediction.
Boosting
Concept: Train models sequentially, where each new model focuses on correcting the errors of the previous ones. Goal: Reduce bias and improve accuracy. Example Algorithms:
AdaBoost Gradient Boosting XGBoost
How it works:
Assign weights to misclassified samples and adjust them iteratively.
Stacking
Concept: Combine predictions from multiple models using a meta-model (often a simple linear model or logistic regression). Goal: Leverage strengths of different algorithms. Example:
Base models: Decision Tree, SVM, Neural Network Meta-model: Logistic Regression
How it works:
Predictions from base models become inputs for the meta-model.
Random Forest
Random Forest creates multiple decision trees using bootstrapped samples and random subsets of features.
Key Formulas:
Gini Index: ( Gini = 1 - \sum_{i=1}^{n} p_i^2 )
Entropy: ( Entropy = - \sum_{i=1}^{n} p_i \log_2(p_i) )
Where ( p_i ) is the probability of class ( i ).