addition of unit 1 3 4 5
This commit is contained in:
34
unit 3/04_Advanced_Pattern_Mining.md
Normal file
34
unit 3/04_Advanced_Pattern_Mining.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# Advanced Pattern Mining
|
||||
|
||||
Beyond basic frequent itemsets, we have more efficient ways to represent patterns.
|
||||
|
||||
## 1. Closed and Maximal Patterns
|
||||
Finding *all* frequent itemsets can produce too many results. We can summarize them.
|
||||
|
||||
### Closed Frequent Itemset
|
||||
- An itemset is **Closed** if none of its supersets have the **same support**.
|
||||
- *Example*:
|
||||
- {Milk}: Support 4
|
||||
- {Milk, Bread}: Support 4
|
||||
- Here, {Milk} is NOT closed because adding Bread didn't change the support count. {Milk, Bread} captures the same information.
|
||||
- **Benefit**: Lossless compression (we don't lose any support info).
|
||||
|
||||
### Maximal Frequent Itemset
|
||||
- An itemset is **Maximal** if none of its supersets are **frequent**.
|
||||
- *Example*:
|
||||
- {Milk, Bread} is frequent.
|
||||
- {Milk, Bread, Diapers} is NOT frequent.
|
||||
- Then {Milk, Bread} is a Maximal Frequent Itemset.
|
||||
- **Benefit**: Smallest set of patterns, but we lose the exact support counts of subsets.
|
||||
|
||||
## 2. Vertical Data Format (Eclat Algorithm)
|
||||
- **Horizontal Format** (Standard):
|
||||
- T1: {A, B, C}
|
||||
- T2: {A, B}
|
||||
- **Vertical Format**:
|
||||
- Item A: {T1, T2}
|
||||
- Item B: {T1, T2}
|
||||
- Item C: {T1}
|
||||
- **How it works**: Instead of counting, we just intersect the Transaction ID lists.
|
||||
- Support(A, B) = Intersection({T1, T2}, {T1, T2}) = Size is 2.
|
||||
- **Benefit**: Very fast for calculating support.
|
||||
Reference in New Issue
Block a user