addition of unit 1 3 4 5

2025-11-24 16:55:19 +05:30
parent 8f8e35ae95
commit f8aea15aaa
24 changed files with 596 additions and 0 deletions
--- a/3/04_Advanced_Pattern_Mining.md
+++ b/3/04_Advanced_Pattern_Mining.md
@@ -0,0 +1,34 @@
+# Advanced Pattern Mining
+
+Beyond basic frequent itemsets, we have more efficient ways to represent patterns.
+
+## 1. Closed and Maximal Patterns
+Finding *all* frequent itemsets can produce too many results. We can summarize them.
+
+### Closed Frequent Itemset
+- An itemset is **Closed** if none of its supersets have the **same support**.
+- *Example*:
+  - {Milk}: Support 4
+  - {Milk, Bread}: Support 4
+  - Here, {Milk} is NOT closed because adding Bread didn't change the support count. {Milk, Bread} captures the same information.
+- **Benefit**: Lossless compression (we don't lose any support info).
+
+### Maximal Frequent Itemset
+- An itemset is **Maximal** if none of its supersets are **frequent**.
+- *Example*:
+  - {Milk, Bread} is frequent.
+  - {Milk, Bread, Diapers} is NOT frequent.
+  - Then {Milk, Bread} is a Maximal Frequent Itemset.
+- **Benefit**: Smallest set of patterns, but we lose the exact support counts of subsets.
+
+## 2. Vertical Data Format (Eclat Algorithm)
+- **Horizontal Format** (Standard):
+  - T1: {A, B, C}
+  - T2: {A, B}
+- **Vertical Format**:
+  - Item A: {T1, T2}
+  - Item B: {T1, T2}
+  - Item C: {T1}
+- **How it works**: Instead of counting, we just intersect the Transaction ID lists.
+  - Support(A, B) = Intersection({T1, T2}, {T1, T2}) = Size is 2.
+- **Benefit**: Very fast for calculating support.