34 lines
1.3 KiB
Markdown
34 lines
1.3 KiB
Markdown
# Decision Tree Algorithm
|
|
|
|
A **Decision Tree** is like a flowchart used for making decisions. It splits data into smaller groups based on rules.
|
|
|
|
## Structure
|
|
- **Root Node**: The starting point. It represents the entire dataset.
|
|
- **Decision Nodes**: Points where the data is split based on a question (e.g., "Is Petal Length < 2.45?").
|
|
- **Leaf Nodes (Terminal Nodes)**: The final output (class label) where no more splits happen.
|
|
|
|
## How it Splits Data
|
|
The tree wants to make the groups as "pure" as possible (containing only one class).
|
|
|
|
### Splitting Criteria
|
|
1. **Gini Impurity** (Default):
|
|
- Measures how mixed the classes are.
|
|
- **0** = Pure (all same class).
|
|
- **0.5** = Impure (mixed classes).
|
|
- The tree tries to **minimize** Gini.
|
|
|
|
2. **Entropy**:
|
|
- Measures disorder or randomness.
|
|
- **0** = Pure.
|
|
- **1** = Highly disordered.
|
|
- The tree tries to **reduce** Entropy (maximize Information Gain).
|
|
|
|
3. **Information Gain**:
|
|
- The difference in Entropy before and after a split.
|
|
- We choose the split that gives the **highest** Information Gain.
|
|
|
|
## Parameters to Control the Tree
|
|
- **max_depth**: How deep the tree can grow. (Too deep = Overfitting).
|
|
- **min_samples_split**: Minimum samples needed to split a node.
|
|
- **max_features**: Number of features to consider for each split.
|