addition of unit 1 3 4 5
This commit is contained in:
45
unit 3/02_Apriori_Algorithm.md
Normal file
45
unit 3/02_Apriori_Algorithm.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# The Apriori Algorithm
|
||||
|
||||
**Apriori** is a classic algorithm to find frequent itemsets.
|
||||
|
||||
## Key Principle (Apriori Property)
|
||||
> "All non-empty subsets of a frequent itemset must also be frequent."
|
||||
- *Meaning*: If `{Beer, Diapers}` is frequent, then `{Beer}` must be frequent and `{Diapers}` must be frequent.
|
||||
- **Reverse**: If `{Beer}` is NOT frequent, then `{Beer, Diapers}` cannot be frequent. (This helps us ignore/prune many combinations).
|
||||
|
||||
## How it Works
|
||||
1. **Scan 1**: Count all single items. Remove those below minimum support.
|
||||
2. **Join**: Combine remaining items to make pairs (Size 2).
|
||||
3. **Prune**: Remove pairs that contain infrequent items.
|
||||
4. **Scan 2**: Count the pairs. Remove those below support.
|
||||
5. **Repeat**: Make Size 3 itemsets, Size 4, etc., until no more can be found.
|
||||
|
||||
## Example
|
||||
**Transactions**:
|
||||
1. {Bread, Milk}
|
||||
2. {Bread, Diapers, Beer, Eggs}
|
||||
3. {Milk, Diapers, Beer, Cola}
|
||||
4. {Bread, Milk, Diapers, Beer}
|
||||
5. {Bread, Milk, Diapers, Cola}
|
||||
|
||||
**Minimum Support = 3**
|
||||
|
||||
1. **Count Items**:
|
||||
- Bread: 4 (Keep)
|
||||
- Milk: 4 (Keep)
|
||||
- Diapers: 4 (Keep)
|
||||
- Beer: 3 (Keep)
|
||||
- Cola: 2 (Drop)
|
||||
- Eggs: 1 (Drop)
|
||||
|
||||
2. **Make Pairs**:
|
||||
- {Bread, Milk}, {Bread, Diapers}, {Bread, Beer}...
|
||||
|
||||
3. **Count Pairs**:
|
||||
- {Bread, Milk}: 3 (Keep)
|
||||
- {Bread, Diapers}: 3 (Keep)
|
||||
- {Milk, Diapers}: 3 (Keep)
|
||||
- {Diapers, Beer}: 3 (Keep)
|
||||
- ...others dropped...
|
||||
|
||||
4. **Result**: We found the frequent groups!
|
||||
Reference in New Issue
Block a user