Files
DMCT-NOTES/unit 2/08_Decision_Tree.md
Akshat Mehta 8f8e35ae95 unit 2 added
2025-11-24 15:26:41 +05:30

34 lines
1.3 KiB
Markdown

# Decision Tree Algorithm
A **Decision Tree** is like a flowchart used for making decisions. It splits data into smaller groups based on rules.
## Structure
- **Root Node**: The starting point. It represents the entire dataset.
- **Decision Nodes**: Points where the data is split based on a question (e.g., "Is Petal Length < 2.45?").
- **Leaf Nodes (Terminal Nodes)**: The final output (class label) where no more splits happen.
## How it Splits Data
The tree wants to make the groups as "pure" as possible (containing only one class).
### Splitting Criteria
1. **Gini Impurity** (Default):
- Measures how mixed the classes are.
- **0** = Pure (all same class).
- **0.5** = Impure (mixed classes).
- The tree tries to **minimize** Gini.
2. **Entropy**:
- Measures disorder or randomness.
- **0** = Pure.
- **1** = Highly disordered.
- The tree tries to **reduce** Entropy (maximize Information Gain).
3. **Information Gain**:
- The difference in Entropy before and after a split.
- We choose the split that gives the **highest** Information Gain.
## Parameters to Control the Tree
- **max_depth**: How deep the tree can grow. (Too deep = Overfitting).
- **min_samples_split**: Minimum samples needed to split a node.
- **max_features**: Number of features to consider for each split.