Path: blob/master/notebooks/book1/18/feature_importance_trees_tutorial.ipynb
1192 views
Authors: Kevin P. Murphy ([email protected]) and Mahmoud Soliman ([email protected])
In this notebook we will explore how to use XGBoost and sklearn's random forests to evaluate feature importance.
XGBoost
Support for the following features:
Vanilla Gradient Boosting algorithm (also known as GBDT (Grandient boosted decisin trees) or GBM(gradient boosting machine) with support to tuning parameters, parallization and GPU support.
Stochastic Gradient Boosting with sampling with uniform and gradient-based sampling support as well as sub-sampling at the row, column and column per split levels.
Regularized Gradient Boosting with support to both L1 and L2 regularization(via alpha and lamda parameters respectively).
Dropout-esque behaviour via DART booster.
Note that we are using the SKLearn-like api of XGBoost for simplicity.
SKLearn
supports several features for ensemble learning one of which is Random forests, which uses bagging of decision tree classifiers (weak learners) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting
#Setup
|████████████████████████████████| 276kB 5.1MB/s eta 0:00:01
Building wheel for lime (setup.py) ... done
|████████████████████████████████| 327kB 5.6MB/s
Building wheel for shap (setup.py) ... done