addition of unit 1 3 4 5

This commit is contained in:
Akshat Mehta
2025-11-24 16:55:19 +05:30
parent 8f8e35ae95
commit f8aea15aaa
24 changed files with 596 additions and 0 deletions

View File

@@ -0,0 +1,34 @@
# Advanced Pattern Mining
Beyond basic frequent itemsets, we have more efficient ways to represent patterns.
## 1. Closed and Maximal Patterns
Finding *all* frequent itemsets can produce too many results. We can summarize them.
### Closed Frequent Itemset
- An itemset is **Closed** if none of its supersets have the **same support**.
- *Example*:
- {Milk}: Support 4
- {Milk, Bread}: Support 4
- Here, {Milk} is NOT closed because adding Bread didn't change the support count. {Milk, Bread} captures the same information.
- **Benefit**: Lossless compression (we don't lose any support info).
### Maximal Frequent Itemset
- An itemset is **Maximal** if none of its supersets are **frequent**.
- *Example*:
- {Milk, Bread} is frequent.
- {Milk, Bread, Diapers} is NOT frequent.
- Then {Milk, Bread} is a Maximal Frequent Itemset.
- **Benefit**: Smallest set of patterns, but we lose the exact support counts of subsets.
## 2. Vertical Data Format (Eclat Algorithm)
- **Horizontal Format** (Standard):
- T1: {A, B, C}
- T2: {A, B}
- **Vertical Format**:
- Item A: {T1, T2}
- Item B: {T1, T2}
- Item C: {T1}
- **How it works**: Instead of counting, we just intersect the Transaction ID lists.
- Support(A, B) = Intersection({T1, T2}, {T1, T2}) = Size is 2.
- **Benefit**: Very fast for calculating support.