1.2 KiB
1.2 KiB
Decision Tree Induction
A Decision Tree is a flowchart-like structure used for classification.
Structure
- Root Node: The top question (e.g., "Is it raining?").
- Branch: The answer (e.g., "Yes" or "No").
- Leaf Node: The final decision/class (e.g., "Play Football" or "Stay Inside").
How to Build a Tree?
We need to decide which attribute to split on first. We use Attribute Selection Measures:
1. Information Gain (Used in ID3 Algorithm)
- Measures how much "uncertainty" (Entropy) is reduced by splitting on an attribute.
- We choose the attribute with the Highest Information Gain.
- Entropy: A measure of randomness.
- High Entropy = Messy/Mixed data (50% Yes, 50% No).
- Low Entropy = Pure data (100% Yes).
2. Gain Ratio (Used in C4.5)
- An improvement over Information Gain. It handles attributes with many values (like "Date") better.
3. Gini Index (Used in CART)
- Measures "Impurity". We want to minimize the Gini Index.
Tree Pruning
Trees can become too complex and memorize the training data (Overfitting).
- Pruning: Cutting off weak branches to make the tree simpler and better at generalizing.
- Pre-pruning: Stop building early.
- Post-pruning: Build the full tree, then cut branches.