akshat/DMCT-NOTES

Files

Akshat Mehta f8aea15aaa addition of unit 1 3 4 5

2025-11-24 16:55:19 +05:30

1.2 KiB

Raw Blame History

Decision Tree Induction

A Decision Tree is a flowchart-like structure used for classification.

Structure

Root Node: The top question (e.g., "Is it raining?").
Branch: The answer (e.g., "Yes" or "No").
Leaf Node: The final decision/class (e.g., "Play Football" or "Stay Inside").

How to Build a Tree?

We need to decide which attribute to split on first. We use Attribute Selection Measures:

1. Information Gain (Used in ID3 Algorithm)

Measures how much "uncertainty" (Entropy) is reduced by splitting on an attribute.
We choose the attribute with the Highest Information Gain.
Entropy: A measure of randomness.
- High Entropy = Messy/Mixed data (50% Yes, 50% No).
- Low Entropy = Pure data (100% Yes).

2. Gain Ratio (Used in C4.5)

An improvement over Information Gain. It handles attributes with many values (like "Date") better.

3. Gini Index (Used in CART)

Measures "Impurity". We want to minimize the Gini Index.

Tree Pruning

Trees can become too complex and memorize the training data (Overfitting).

Pruning: Cutting off weak branches to make the tree simpler and better at generalizing.
Pre-pruning: Stop building early.
Post-pruning: Build the full tree, then cut branches.