Files
DMCT-NOTES/unit 3/02_Apriori_Algorithm.md
2025-11-24 16:55:19 +05:30

1.4 KiB

The Apriori Algorithm

Apriori is a classic algorithm to find frequent itemsets.

Key Principle (Apriori Property)

"All non-empty subsets of a frequent itemset must also be frequent."

  • Meaning: If {Beer, Diapers} is frequent, then {Beer} must be frequent and {Diapers} must be frequent.
  • Reverse: If {Beer} is NOT frequent, then {Beer, Diapers} cannot be frequent. (This helps us ignore/prune many combinations).

How it Works

  1. Scan 1: Count all single items. Remove those below minimum support.
  2. Join: Combine remaining items to make pairs (Size 2).
  3. Prune: Remove pairs that contain infrequent items.
  4. Scan 2: Count the pairs. Remove those below support.
  5. Repeat: Make Size 3 itemsets, Size 4, etc., until no more can be found.

Example

Transactions:

  1. {Bread, Milk}
  2. {Bread, Diapers, Beer, Eggs}
  3. {Milk, Diapers, Beer, Cola}
  4. {Bread, Milk, Diapers, Beer}
  5. {Bread, Milk, Diapers, Cola}

Minimum Support = 3

  1. Count Items:

    • Bread: 4 (Keep)
    • Milk: 4 (Keep)
    • Diapers: 4 (Keep)
    • Beer: 3 (Keep)
    • Cola: 2 (Drop)
    • Eggs: 1 (Drop)
  2. Make Pairs:

    • {Bread, Milk}, {Bread, Diapers}, {Bread, Beer}...
  3. Count Pairs:

    • {Bread, Milk}: 3 (Keep)
    • {Bread, Diapers}: 3 (Keep)
    • {Milk, Diapers}: 3 (Keep)
    • {Diapers, Beer}: 3 (Keep)
    • ...others dropped...
  4. Result: We found the frequent groups!