Path: blob/master/ML Classification using Python/Decision Tree Pruning.ipynb
4732 views
Decision Trees and Pruning
Decision Trees are powerful models used for classification and regression. They split data into subsets based on feature values, creating a tree-like structure. However, fully grown trees can overfit the training data. Pruning helps reduce complexity and improve generalization.
How Decision Trees Work
Decision Trees split nodes based on impurity measures such as Gini Index or Entropy.
Formulas:
Gini Index: [ Gini = 1 - \sum_{i=1}^{C} p_i^2 ]
Entropy: [ Entropy = - \sum_{i=1}^{C} p_i \log_2(p_i) ]
Where (p_i) is the proportion of class (i) in the node.
Pruning in Decision Trees
Pruning reduces the size of a decision tree by removing branches that have little predictive power.
Types of Pruning:
Pre-Pruning (Early Stopping): Stop tree growth early using constraints like
max_depth,min_samples_split.Post-Pruning (Cost Complexity Pruning): Grow the full tree, then prune back.
Cost Complexity Formula:
[ R_\alpha(T) = R(T) + \alpha |T| ] Where:
(R(T)): Misclassification error of tree (T)
(|T|): Number of leaves
(\alpha): Complexity parameter
Conclusion
Pruning helps prevent overfitting by simplifying the tree structure. It improves generalization and makes the model more interpretable.