1.4 KiB
1.4 KiB
The Apriori Algorithm
Apriori is a classic algorithm to find frequent itemsets.
Key Principle (Apriori Property)
"All non-empty subsets of a frequent itemset must also be frequent."
- Meaning: If
{Beer, Diapers}is frequent, then{Beer}must be frequent and{Diapers}must be frequent. - Reverse: If
{Beer}is NOT frequent, then{Beer, Diapers}cannot be frequent. (This helps us ignore/prune many combinations).
How it Works
- Scan 1: Count all single items. Remove those below minimum support.
- Join: Combine remaining items to make pairs (Size 2).
- Prune: Remove pairs that contain infrequent items.
- Scan 2: Count the pairs. Remove those below support.
- Repeat: Make Size 3 itemsets, Size 4, etc., until no more can be found.
Example
Transactions:
- {Bread, Milk}
- {Bread, Diapers, Beer, Eggs}
- {Milk, Diapers, Beer, Cola}
- {Bread, Milk, Diapers, Beer}
- {Bread, Milk, Diapers, Cola}
Minimum Support = 3
-
Count Items:
- Bread: 4 (Keep)
- Milk: 4 (Keep)
- Diapers: 4 (Keep)
- Beer: 3 (Keep)
- Cola: 2 (Drop)
- Eggs: 1 (Drop)
-
Make Pairs:
- {Bread, Milk}, {Bread, Diapers}, {Bread, Beer}...
-
Count Pairs:
- {Bread, Milk}: 3 (Keep)
- {Bread, Diapers}: 3 (Keep)
- {Milk, Diapers}: 3 (Keep)
- {Diapers, Beer}: 3 (Keep)
- ...others dropped...
-
Result: We found the frequent groups!