addition of unit 1 3 4 5

2025-11-24 16:55:19 +05:30
parent 8f8e35ae95
commit f8aea15aaa
24 changed files with 596 additions and 0 deletions
--- a/4/02_Decision_Trees.md
+++ b/4/02_Decision_Trees.md
@@ -0,0 +1,30 @@
+# Decision Tree Induction
+
+A **Decision Tree** is a flowchart-like structure used for classification.
+
+## Structure
+- **Root Node**: The top question (e.g., "Is it raining?").
+- **Branch**: The answer (e.g., "Yes" or "No").
+- **Leaf Node**: The final decision/class (e.g., "Play Football" or "Stay Inside").
+
+## How to Build a Tree?
+We need to decide which attribute to split on first. We use **Attribute Selection Measures**:
+
+### 1. Information Gain (Used in ID3 Algorithm)
+- Measures how much "uncertainty" (Entropy) is reduced by splitting on an attribute.
+- We choose the attribute with the **Highest Information Gain**.
+- **Entropy**: A measure of randomness.
+  - High Entropy = Messy/Mixed data (50% Yes, 50% No).
+  - Low Entropy = Pure data (100% Yes).
+
+### 2. Gain Ratio (Used in C4.5)
+- An improvement over Information Gain. It handles attributes with many values (like "Date") better.
+
+### 3. Gini Index (Used in CART)
+- Measures "Impurity". We want to minimize the Gini Index.
+
+## Tree Pruning
+Trees can become too complex and memorize the training data (**Overfitting**).
+- **Pruning**: Cutting off weak branches to make the tree simpler and better at generalizing.
+- **Pre-pruning**: Stop building early.
+- **Post-pruning**: Build the full tree, then cut branches.