# Decision Tree Induction A **Decision Tree** is a flowchart-like structure used for classification. ## Structure - **Root Node**: The top question (e.g., "Is it raining?"). - **Branch**: The answer (e.g., "Yes" or "No"). - **Leaf Node**: The final decision/class (e.g., "Play Football" or "Stay Inside"). ## How to Build a Tree? We need to decide which attribute to split on first. We use **Attribute Selection Measures**: ### 1. Information Gain (Used in ID3 Algorithm) - Measures how much "uncertainty" (Entropy) is reduced by splitting on an attribute. - We choose the attribute with the **Highest Information Gain**. - **Entropy**: A measure of randomness. - High Entropy = Messy/Mixed data (50% Yes, 50% No). - Low Entropy = Pure data (100% Yes). ### 2. Gain Ratio (Used in C4.5) - An improvement over Information Gain. It handles attributes with many values (like "Date") better. ### 3. Gini Index (Used in CART) - Measures "Impurity". We want to minimize the Gini Index. ## Tree Pruning Trees can become too complex and memorize the training data (**Overfitting**). - **Pruning**: Cutting off weak branches to make the tree simpler and better at generalizing. - **Pre-pruning**: Stop building early. - **Post-pruning**: Build the full tree, then cut branches.