Path: blob/main/lab7/lab7-trees_and_ensembles.ipynb
340 views
Lab 7: Ensemble Methods
This week we will look at how collections of decision trees (or other machine learning models) can be trained on the same dataset and combined to enhance predictive performance. Specifically, we will look at bagging, random forests and boosting which are all related examples of ensemble methods.
1) Bagging
The combination of models can often perform much better than the average individual, and sometimes better than the best individual. Ensemble methods are ways of combining multiple models together. For good performance, the models should be diverse to minimise the expected error of the ensemble.
Bagging 'bootstrap aggregation' is a simple ensemble method that induces diversity by training M models on different samples of the training set (with replacement) and combining predictions by taking the mean or majority vote. An approximate bagging algorithm is:
For models:
Randomly sample N data points with replacement from the training set
Learn a decision tree (CART algorithm) on the subset
The final prediction is found by a majority vote
1.1) Train a bagging ensemble
Complete the code below to train a bagging ensemble by randomly sampling from the MNIST training dataset to train multiple decision tress.
1.2) Implement bagging prediction
Complete the bagging_predict
function and then run the next cell to create predictions from the bagging ensemble based on majority voting of the individual models.
How does the accuracy compare to a single decision tree?
Investigate the effect of changing the sample_size
and num_models
hyperparameters.
2) Random Forests
With bagging, the base models (individual decision trees) make similar splits on the same features, meaning that their errors are correlated and this reduces the diversity of the ensemble and limits performance.
Random forests improve the diversity of the base models by limiting the number of features considered for determining each split in the decision tree. We can obtain the random forest by modifying the bagging algorithm above so that each split for each model uses only a random subset of features.
2.1) Implement Random Forest Training
Copy in your code for the bagging procedure and modify it to implement random forest. The outline code below shows you where to make the modifications.
2.2) Random Forest Prediction
Use the bagging_predict
function from Section 2.2 to generate predictions for the random forest and calculate the accuracy.
How does the performance of the random forest compare to bagging and the single model? Can you improve the performance by changing the hyperparameters?
3) Boosting
We can use a decision tree classifier as the base model for the ensemble method known as boosting. Boosting involves training base models in sequence to ensure that each base model addresses the weaknesses of the ensemble. Instead of training a new base model on a random sample, we weight the data points in the training set according to the performance of previous base models.
AdaBoost (adaptive boosting) is a popular boosting method, where training examples that are misclassified by one of the base classifiers are given greater weight when used to train the next classifier in the sequence. Once all the classifiers have been trained, their predictions are then combined through a weighted majority voting scheme.
The AdaBoost algorithm which you will implement is given below:
Initialize the data weighting coefficients by setting for where is the number of training examples
For models:
Fit a classifier to a subset of the training data by minimising the weighted error function (hint: specify the
sample_weight
when fitting the model using scikit-learn).Calculate the weighted error, , where ParseError: KaTeX parse error: Expected 'EOF', got '_' at position 32: … \text{weighted_̲accuracy} = 1 -… and is the indicator function that equals when the condition is true (hint: the computation for the weighted accuracy is done for you if
sample_weight
is specified when calling thescore
function).Calculate the model weighting coefficients, , where
Update the data weighting coefficients where, again, is the indicator function.
The final prediction is a weighted combination of the trained base classifiers weighted by .
For more information on boosting, see Bishop section 14.3.
3.1) Train an ensemble model using the AdaBoost algorithm
3.2) Adaboost prediction
Complete the boosting_predict
function to produce predictions from the trained models. In addition to the test data and the trained models, the function also takes the list of as an input which determines the weighting of each individual model on the overall output.
How does performance compare with the other approaches?
Try out different values of sample_size
, num_models
and max_depth
of the decision tree.
How does training time vary for each approach as you change these ensemble parameters?
Wrap up
We then implemented bagging, then extended it to the Random Forest and Boosting methods. This should give some idea of how these three key ensemble methods are related to one another. Random Forest adds random sampling over features, while boosting re-weights the dataset at each iteration to focus on misclassified data points.
References
COMS30035 Machine Learning lecture notes.
Bishop Pattern Recognition and Machine Learning: Chapter 14.