46 lines
1.4 KiB
Markdown
46 lines
1.4 KiB
Markdown
# The Apriori Algorithm
|
|
|
|
**Apriori** is a classic algorithm to find frequent itemsets.
|
|
|
|
## Key Principle (Apriori Property)
|
|
> "All non-empty subsets of a frequent itemset must also be frequent."
|
|
- *Meaning*: If `{Beer, Diapers}` is frequent, then `{Beer}` must be frequent and `{Diapers}` must be frequent.
|
|
- **Reverse**: If `{Beer}` is NOT frequent, then `{Beer, Diapers}` cannot be frequent. (This helps us ignore/prune many combinations).
|
|
|
|
## How it Works
|
|
1. **Scan 1**: Count all single items. Remove those below minimum support.
|
|
2. **Join**: Combine remaining items to make pairs (Size 2).
|
|
3. **Prune**: Remove pairs that contain infrequent items.
|
|
4. **Scan 2**: Count the pairs. Remove those below support.
|
|
5. **Repeat**: Make Size 3 itemsets, Size 4, etc., until no more can be found.
|
|
|
|
## Example
|
|
**Transactions**:
|
|
1. {Bread, Milk}
|
|
2. {Bread, Diapers, Beer, Eggs}
|
|
3. {Milk, Diapers, Beer, Cola}
|
|
4. {Bread, Milk, Diapers, Beer}
|
|
5. {Bread, Milk, Diapers, Cola}
|
|
|
|
**Minimum Support = 3**
|
|
|
|
1. **Count Items**:
|
|
- Bread: 4 (Keep)
|
|
- Milk: 4 (Keep)
|
|
- Diapers: 4 (Keep)
|
|
- Beer: 3 (Keep)
|
|
- Cola: 2 (Drop)
|
|
- Eggs: 1 (Drop)
|
|
|
|
2. **Make Pairs**:
|
|
- {Bread, Milk}, {Bread, Diapers}, {Bread, Beer}...
|
|
|
|
3. **Count Pairs**:
|
|
- {Bread, Milk}: 3 (Keep)
|
|
- {Bread, Diapers}: 3 (Keep)
|
|
- {Milk, Diapers}: 3 (Keep)
|
|
- {Diapers, Beer}: 3 (Keep)
|
|
- ...others dropped...
|
|
|
|
4. **Result**: We found the frequent groups!
|