DMCT-NOTES/unit 3/04_Advanced_Pattern_Mining.md

# Advanced Pattern Mining

Beyond basic frequent itemsets, we have more efficient ways to represent patterns.

## 1. Closed and Maximal Patterns
Finding *all* frequent itemsets can produce too many results. We can summarize them.

### Closed Frequent Itemset
- An itemset is **Closed** if none of its supersets have the **same support**.
- *Example*:
  - {Milk}: Support 4
  - {Milk, Bread}: Support 4
  - Here, {Milk} is NOT closed because adding Bread didn't change the support count. {Milk, Bread} captures the same information.
- **Benefit**: Lossless compression (we don't lose any support info).

### Maximal Frequent Itemset
- An itemset is **Maximal** if none of its supersets are **frequent**.
- *Example*:
  - {Milk, Bread} is frequent.
  - {Milk, Bread, Diapers} is NOT frequent.
  - Then {Milk, Bread} is a Maximal Frequent Itemset.
- **Benefit**: Smallest set of patterns, but we lose the exact support counts of subsets.

## 2. Vertical Data Format (Eclat Algorithm)
- **Horizontal Format** (Standard):
  - T1: {A, B, C}
  - T2: {A, B}
- **Vertical Format**:
  - Item A: {T1, T2}
  - Item B: {T1, T2}
  - Item C: {T1}
- **How it works**: Instead of counting, we just intersect the Transaction ID lists.
  - Support(A, B) = Intersection({T1, T2}, {T1, T2}) = Size is 2.
- **Benefit**: Very fast for calculating support.