Files
DMCT-NOTES/unit 2/08_Decision_Tree.md
Akshat Mehta 8f8e35ae95 unit 2 added
2025-11-24 15:26:41 +05:30

1.3 KiB

Decision Tree Algorithm

A Decision Tree is like a flowchart used for making decisions. It splits data into smaller groups based on rules.

Structure

  • Root Node: The starting point. It represents the entire dataset.
  • Decision Nodes: Points where the data is split based on a question (e.g., "Is Petal Length < 2.45?").
  • Leaf Nodes (Terminal Nodes): The final output (class label) where no more splits happen.

How it Splits Data

The tree wants to make the groups as "pure" as possible (containing only one class).

Splitting Criteria

  1. Gini Impurity (Default):

    • Measures how mixed the classes are.
    • 0 = Pure (all same class).
    • 0.5 = Impure (mixed classes).
    • The tree tries to minimize Gini.
  2. Entropy:

    • Measures disorder or randomness.
    • 0 = Pure.
    • 1 = Highly disordered.
    • The tree tries to reduce Entropy (maximize Information Gain).
  3. Information Gain:

    • The difference in Entropy before and after a split.
    • We choose the split that gives the highest Information Gain.

Parameters to Control the Tree

  • max_depth: How deep the tree can grow. (Too deep = Overfitting).
  • min_samples_split: Minimum samples needed to split a node.
  • max_features: Number of features to consider for each split.