unit 2 added
This commit is contained in:
33
unit 2/08_Decision_Tree.md
Normal file
33
unit 2/08_Decision_Tree.md
Normal file
@@ -0,0 +1,33 @@
|
||||
# Decision Tree Algorithm
|
||||
|
||||
A **Decision Tree** is like a flowchart used for making decisions. It splits data into smaller groups based on rules.
|
||||
|
||||
## Structure
|
||||
- **Root Node**: The starting point. It represents the entire dataset.
|
||||
- **Decision Nodes**: Points where the data is split based on a question (e.g., "Is Petal Length < 2.45?").
|
||||
- **Leaf Nodes (Terminal Nodes)**: The final output (class label) where no more splits happen.
|
||||
|
||||
## How it Splits Data
|
||||
The tree wants to make the groups as "pure" as possible (containing only one class).
|
||||
|
||||
### Splitting Criteria
|
||||
1. **Gini Impurity** (Default):
|
||||
- Measures how mixed the classes are.
|
||||
- **0** = Pure (all same class).
|
||||
- **0.5** = Impure (mixed classes).
|
||||
- The tree tries to **minimize** Gini.
|
||||
|
||||
2. **Entropy**:
|
||||
- Measures disorder or randomness.
|
||||
- **0** = Pure.
|
||||
- **1** = Highly disordered.
|
||||
- The tree tries to **reduce** Entropy (maximize Information Gain).
|
||||
|
||||
3. **Information Gain**:
|
||||
- The difference in Entropy before and after a split.
|
||||
- We choose the split that gives the **highest** Information Gain.
|
||||
|
||||
## Parameters to Control the Tree
|
||||
- **max_depth**: How deep the tree can grow. (Too deep = Overfitting).
|
||||
- **min_samples_split**: Minimum samples needed to split a node.
|
||||
- **max_features**: Number of features to consider for each split.
|
||||
Reference in New Issue
Block a user