Path: blob/master/Machine Learning Ensemble Methods/4.1 AdaBoost Weights and Comparison.ipynb
7216 views
AdaBoost
Visualization: How AdaBoost increases emphasis (weights) on hard samples across iterations.
Explanation: A concise, trainer-friendly walkthrough of the AdaBoost algorithm.
Comparison Table: AdaBoost vs Gradient Boosting on a simple synthetic dataset.
1) AdaBoost — Concept Recap (Trainer Notes)
Idea: Build many weak learners sequentially (typically decision stumps = depth‑1 trees). Each step upweights misclassified samples so the next learner focuses on what previous ones got wrong. Final prediction is a weighted vote of all weak learners.
Key quantities per iteration t:
Weighted error: ParseError: KaTeX parse error: Unexpected character: '' at position 1: ̲arepsilon_t = \…
Learner weight: ParseError: KaTeX parse error: Unexpected character: '' at position 1: ̲lpha_t = frac{…
Sample weights: ParseError: KaTeX parse error: Unexpected character: '' at position 40: …{(t)} \cdot e^{̲lpha_t \cdot [h… (renormalized)
Practical defaults:
estimator=DecisionTreeClassifier(max_depth=1)(stump)n_estimators=50to200for quick demoslearning_ratein[0.1, 1.0]to scale each learner’s contribution
2) Setup & Data
3) Train AdaBoost (Decision Stumps) & Capture Iteration Stats
We’ll use scikit-learn’s estimator_errors_ and estimator_weights_ to visualize how emphasis evolves. We’ll also track the distribution of sample weights over iterations.
3a) Visualize Estimator Errors & Weights per Iteration
Estimator error should generally be < 0.5 for a learner to get a positive weight.
Estimator weight (alpha) grows as error drops.
3b) Visualize Sample Weight Distribution Over Iterations
We’ll reconstruct how sample weights spread as training proceeds. (For speed, we snapshot every 5 iterations.)
4) Decision Boundary (Optional Visual)
5) AdaBoost vs Gradient Boosting — Comparison Table
We’ll compare Accuracy, F1, ROC‑AUC, and Fit Time (s). Both are trained with quick, reasonable defaults.
5a) Visual Comparison (Accuracy)
6) Takeaways
AdaBoost increases the influence of hard samples via growing sample weights; learners with lower error get higher alpha weights.
Gradient Boosting reduces loss by fitting residuals stage‑wise; no explicit sample reweighting is surfaced, but additive trees minimize a differentiable loss.
On simple tabular data, both perform competitively; runtime and stability can vary with hyperparameters and data complexity.