akshat/DMCT-NOTES

Files

Akshat Mehta 8f8e35ae95 unit 2 added

2025-11-24 15:26:41 +05:30

1.3 KiB

Raw Blame History

Decision Tree Algorithm

A Decision Tree is like a flowchart used for making decisions. It splits data into smaller groups based on rules.

Structure

Root Node: The starting point. It represents the entire dataset.
Decision Nodes: Points where the data is split based on a question (e.g., "Is Petal Length < 2.45?").
Leaf Nodes (Terminal Nodes): The final output (class label) where no more splits happen.

How it Splits Data

The tree wants to make the groups as "pure" as possible (containing only one class).

Splitting Criteria

Gini Impurity (Default):
- Measures how mixed the classes are.
- 0 = Pure (all same class).
- 0.5 = Impure (mixed classes).
- The tree tries to minimize Gini.
Entropy:
- Measures disorder or randomness.
- 0 = Pure.
- 1 = Highly disordered.
- The tree tries to reduce Entropy (maximize Information Gain).
Information Gain:
- The difference in Entropy before and after a split.
- We choose the split that gives the highest Information Gain.

Parameters to Control the Tree

max_depth: How deep the tree can grow. (Too deep = Overfitting).
min_samples_split: Minimum samples needed to split a node.
max_features: Number of features to consider for each split.