# Decision Tree Algorithm A **Decision Tree** is like a flowchart used for making decisions. It splits data into smaller groups based on rules. ## Structure - **Root Node**: The starting point. It represents the entire dataset. - **Decision Nodes**: Points where the data is split based on a question (e.g., "Is Petal Length < 2.45?"). - **Leaf Nodes (Terminal Nodes)**: The final output (class label) where no more splits happen. ## How it Splits Data The tree wants to make the groups as "pure" as possible (containing only one class). ### Splitting Criteria 1. **Gini Impurity** (Default): - Measures how mixed the classes are. - **0** = Pure (all same class). - **0.5** = Impure (mixed classes). - The tree tries to **minimize** Gini. 2. **Entropy**: - Measures disorder or randomness. - **0** = Pure. - **1** = Highly disordered. - The tree tries to **reduce** Entropy (maximize Information Gain). 3. **Information Gain**: - The difference in Entropy before and after a split. - We choose the split that gives the **highest** Information Gain. ## Parameters to Control the Tree - **max_depth**: How deep the tree can grow. (Too deep = Overfitting). - **min_samples_split**: Minimum samples needed to split a node. - **max_features**: Number of features to consider for each split.