35 lines
1.3 KiB
Markdown
35 lines
1.3 KiB
Markdown
# Advanced Pattern Mining
|
|
|
|
Beyond basic frequent itemsets, we have more efficient ways to represent patterns.
|
|
|
|
## 1. Closed and Maximal Patterns
|
|
Finding *all* frequent itemsets can produce too many results. We can summarize them.
|
|
|
|
### Closed Frequent Itemset
|
|
- An itemset is **Closed** if none of its supersets have the **same support**.
|
|
- *Example*:
|
|
- {Milk}: Support 4
|
|
- {Milk, Bread}: Support 4
|
|
- Here, {Milk} is NOT closed because adding Bread didn't change the support count. {Milk, Bread} captures the same information.
|
|
- **Benefit**: Lossless compression (we don't lose any support info).
|
|
|
|
### Maximal Frequent Itemset
|
|
- An itemset is **Maximal** if none of its supersets are **frequent**.
|
|
- *Example*:
|
|
- {Milk, Bread} is frequent.
|
|
- {Milk, Bread, Diapers} is NOT frequent.
|
|
- Then {Milk, Bread} is a Maximal Frequent Itemset.
|
|
- **Benefit**: Smallest set of patterns, but we lose the exact support counts of subsets.
|
|
|
|
## 2. Vertical Data Format (Eclat Algorithm)
|
|
- **Horizontal Format** (Standard):
|
|
- T1: {A, B, C}
|
|
- T2: {A, B}
|
|
- **Vertical Format**:
|
|
- Item A: {T1, T2}
|
|
- Item B: {T1, T2}
|
|
- Item C: {T1}
|
|
- **How it works**: Instead of counting, we just intersect the Transaction ID lists.
|
|
- Support(A, B) = Intersection({T1, T2}, {T1, T2}) = Size is 2.
|
|
- **Benefit**: Very fast for calculating support.
|