Files
DMCT-NOTES/unit 3/02_Apriori_Algorithm.md
2025-11-24 16:55:19 +05:30

46 lines
1.4 KiB
Markdown

# The Apriori Algorithm
**Apriori** is a classic algorithm to find frequent itemsets.
## Key Principle (Apriori Property)
> "All non-empty subsets of a frequent itemset must also be frequent."
- *Meaning*: If `{Beer, Diapers}` is frequent, then `{Beer}` must be frequent and `{Diapers}` must be frequent.
- **Reverse**: If `{Beer}` is NOT frequent, then `{Beer, Diapers}` cannot be frequent. (This helps us ignore/prune many combinations).
## How it Works
1. **Scan 1**: Count all single items. Remove those below minimum support.
2. **Join**: Combine remaining items to make pairs (Size 2).
3. **Prune**: Remove pairs that contain infrequent items.
4. **Scan 2**: Count the pairs. Remove those below support.
5. **Repeat**: Make Size 3 itemsets, Size 4, etc., until no more can be found.
## Example
**Transactions**:
1. {Bread, Milk}
2. {Bread, Diapers, Beer, Eggs}
3. {Milk, Diapers, Beer, Cola}
4. {Bread, Milk, Diapers, Beer}
5. {Bread, Milk, Diapers, Cola}
**Minimum Support = 3**
1. **Count Items**:
- Bread: 4 (Keep)
- Milk: 4 (Keep)
- Diapers: 4 (Keep)
- Beer: 3 (Keep)
- Cola: 2 (Drop)
- Eggs: 1 (Drop)
2. **Make Pairs**:
- {Bread, Milk}, {Bread, Diapers}, {Bread, Beer}...
3. **Count Pairs**:
- {Bread, Milk}: 3 (Keep)
- {Bread, Diapers}: 3 (Keep)
- {Milk, Diapers}: 3 (Keep)
- {Diapers, Beer}: 3 (Keep)
- ...others dropped...
4. **Result**: We found the frequent groups!