addition of unit 1 3 4 5
This commit is contained in:
30
unit 4/02_Decision_Trees.md
Normal file
30
unit 4/02_Decision_Trees.md
Normal file
@@ -0,0 +1,30 @@
|
||||
# Decision Tree Induction
|
||||
|
||||
A **Decision Tree** is a flowchart-like structure used for classification.
|
||||
|
||||
## Structure
|
||||
- **Root Node**: The top question (e.g., "Is it raining?").
|
||||
- **Branch**: The answer (e.g., "Yes" or "No").
|
||||
- **Leaf Node**: The final decision/class (e.g., "Play Football" or "Stay Inside").
|
||||
|
||||
## How to Build a Tree?
|
||||
We need to decide which attribute to split on first. We use **Attribute Selection Measures**:
|
||||
|
||||
### 1. Information Gain (Used in ID3 Algorithm)
|
||||
- Measures how much "uncertainty" (Entropy) is reduced by splitting on an attribute.
|
||||
- We choose the attribute with the **Highest Information Gain**.
|
||||
- **Entropy**: A measure of randomness.
|
||||
- High Entropy = Messy/Mixed data (50% Yes, 50% No).
|
||||
- Low Entropy = Pure data (100% Yes).
|
||||
|
||||
### 2. Gain Ratio (Used in C4.5)
|
||||
- An improvement over Information Gain. It handles attributes with many values (like "Date") better.
|
||||
|
||||
### 3. Gini Index (Used in CART)
|
||||
- Measures "Impurity". We want to minimize the Gini Index.
|
||||
|
||||
## Tree Pruning
|
||||
Trees can become too complex and memorize the training data (**Overfitting**).
|
||||
- **Pruning**: Cutting off weak branches to make the tree simpler and better at generalizing.
|
||||
- **Pre-pruning**: Stop building early.
|
||||
- **Post-pruning**: Build the full tree, then cut branches.
|
||||
Reference in New Issue
Block a user