Files
DMCT-NOTES/unit 4/02_Decision_Trees.md
2025-11-24 16:55:19 +05:30

1.2 KiB

Decision Tree Induction

A Decision Tree is a flowchart-like structure used for classification.

Structure

  • Root Node: The top question (e.g., "Is it raining?").
  • Branch: The answer (e.g., "Yes" or "No").
  • Leaf Node: The final decision/class (e.g., "Play Football" or "Stay Inside").

How to Build a Tree?

We need to decide which attribute to split on first. We use Attribute Selection Measures:

1. Information Gain (Used in ID3 Algorithm)

  • Measures how much "uncertainty" (Entropy) is reduced by splitting on an attribute.
  • We choose the attribute with the Highest Information Gain.
  • Entropy: A measure of randomness.
    • High Entropy = Messy/Mixed data (50% Yes, 50% No).
    • Low Entropy = Pure data (100% Yes).

2. Gain Ratio (Used in C4.5)

  • An improvement over Information Gain. It handles attributes with many values (like "Date") better.

3. Gini Index (Used in CART)

  • Measures "Impurity". We want to minimize the Gini Index.

Tree Pruning

Trees can become too complex and memorize the training data (Overfitting).

  • Pruning: Cutting off weak branches to make the tree simpler and better at generalizing.
  • Pre-pruning: Stop building early.
  • Post-pruning: Build the full tree, then cut branches.