addition of unit 1 3 4 5

2025-11-24 16:55:19 +05:30
parent 8f8e35ae95
commit f8aea15aaa
24 changed files with 596 additions and 0 deletions
--- a/3/00_Index.md
+++ b/3/00_Index.md
@@ -0,0 +1,18 @@
+# Unit 3: Association Rule Mining
+
+Welcome to your simplified notes for Unit 3.
+
+## Table of Contents
+
+1. [[01_Association_Rule_Mining|Introduction to Association Rules]]
+   - What is it? (Market Basket Analysis)
+   - Key Terms: Support, Confidence, Frequent Itemsets
+2. [[02_Apriori_Algorithm|The Apriori Algorithm]]
+   - How it works (Join & Prune)
+   - Example Calculation
+3. [[03_FP_Growth_Algorithm|FP-Growth Algorithm]]
+   - FP-Tree Structure
+   - Why it is faster than Apriori
+4. [[04_Advanced_Pattern_Mining|Advanced Pattern Mining]]
+   - Closed and Maximal Patterns
+   - Vertical Data Format (Eclat)
--- a/3/01_Association_Rule_Mining.md
+++ b/3/01_Association_Rule_Mining.md
@@ -0,0 +1,27 @@
+# Introduction to Association Rules
+
+**Association Rule Mining** is a technique to find relationships between items in a large dataset.
+- **Classic Example**: "Market Basket Analysis" - finding what products customers buy together.
+- *Example*: "If a customer buys **Bread**, they are 80% likely to buy **Butter**."
+
+## Key Concepts
+
+### 1. Itemset
+- A collection of one or more items.
+- *Example*: `{Milk, Bread, Diapers}`
+
+### 2. Support (Frequency)
+- How often an itemset appears in the database.
+- **Formula**:
+  $$ \text{Support}(A) = \frac{\text{Transactions containing } A}{\text{Total Transactions}} $$
+- *Example*: If Milk appears in 4 out of 5 transactions, Support = 80%.
+
+### 3. Confidence (Reliability)
+- How likely item B is purchased when item A is purchased.
+- **Formula**:
+  $$ \text{Confidence}(A \to B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)} $$
+- *Example*: If Milk and Bread appear together in 3 transactions, and Milk appears in 4:
+  - Confidence(Milk -> Bread) = 3/4 = 75%.
+
+### 4. Frequent Itemset
+- An itemset that meets a minimum **Support Threshold** (e.g., must appear at least 3 times).
--- a/3/02_Apriori_Algorithm.md
+++ b/3/02_Apriori_Algorithm.md
@@ -0,0 +1,45 @@
+# The Apriori Algorithm
+
+**Apriori** is a classic algorithm to find frequent itemsets.
+
+## Key Principle (Apriori Property)
+> "All non-empty subsets of a frequent itemset must also be frequent."
+- *Meaning*: If `{Beer, Diapers}` is frequent, then `{Beer}` must be frequent and `{Diapers}` must be frequent.
+- **Reverse**: If `{Beer}` is NOT frequent, then `{Beer, Diapers}` cannot be frequent. (This helps us ignore/prune many combinations).
+
+## How it Works
+1. **Scan 1**: Count all single items. Remove those below minimum support.
+2. **Join**: Combine remaining items to make pairs (Size 2).
+3. **Prune**: Remove pairs that contain infrequent items.
+4. **Scan 2**: Count the pairs. Remove those below support.
+5. **Repeat**: Make Size 3 itemsets, Size 4, etc., until no more can be found.
+
+## Example
+**Transactions**:
+1. {Bread, Milk}
+2. {Bread, Diapers, Beer, Eggs}
+3. {Milk, Diapers, Beer, Cola}
+4. {Bread, Milk, Diapers, Beer}
+5. {Bread, Milk, Diapers, Cola}
+
+**Minimum Support = 3**
+
+1. **Count Items**:
+   - Bread: 4 (Keep)
+   - Milk: 4 (Keep)
+   - Diapers: 4 (Keep)
+   - Beer: 3 (Keep)
+   - Cola: 2 (Drop)
+   - Eggs: 1 (Drop)
+
+2. **Make Pairs**:
+   - {Bread, Milk}, {Bread, Diapers}, {Bread, Beer}...
+
+3. **Count Pairs**:
+   - {Bread, Milk}: 3 (Keep)
+   - {Bread, Diapers}: 3 (Keep)
+   - {Milk, Diapers}: 3 (Keep)
+   - {Diapers, Beer}: 3 (Keep)
+   - ...others dropped...
+
+4. **Result**: We found the frequent groups!
--- a/3/03_FP_Growth_Algorithm.md
+++ b/3/03_FP_Growth_Algorithm.md
@@ -0,0 +1,27 @@
+# FP-Growth Algorithm
+
+**FP-Growth (Frequent Pattern Growth)** is a faster and more efficient algorithm than Apriori.
+
+## Why is it better?
+- **Apriori** scans the database many times (once for size 1, once for size 2, etc.). This is slow.
+- **FP-Growth** scans the database **only twice**.
+  1. First scan: Count frequencies.
+  2. Second scan: Build the **FP-Tree**.
+
+## The FP-Tree
+An **FP-Tree** (Frequent Pattern Tree) is a compressed tree structure that stores all the transaction information.
+- More frequent items are near the root (top).
+- Less frequent items are leaves (bottom).
+
+## Steps
+1. **Count Frequencies**: Find support for all items. Drop infrequent ones.
+2. **Sort**: Sort items in each transaction by frequency (Highest to Lowest).
+3. **Build Tree**: Insert transactions into the tree. Shared items share the same path.
+4. **Mine Tree**:
+   - Start from the bottom (least frequent item).
+   - Build a "Conditional Pattern Base" (all paths leading to that item).
+   - Construct a "Conditional FP-Tree".
+   - Recursively find frequent patterns.
+
+## Divide and Conquer
+FP-Growth uses a **Divide and Conquer** strategy. It breaks the problem into smaller sub-problems (conditional trees) rather than generating millions of candidate sets like Apriori.
--- a/3/04_Advanced_Pattern_Mining.md
+++ b/3/04_Advanced_Pattern_Mining.md
@@ -0,0 +1,34 @@
+# Advanced Pattern Mining
+
+Beyond basic frequent itemsets, we have more efficient ways to represent patterns.
+
+## 1. Closed and Maximal Patterns
+Finding *all* frequent itemsets can produce too many results. We can summarize them.
+
+### Closed Frequent Itemset
+- An itemset is **Closed** if none of its supersets have the **same support**.
+- *Example*:
+  - {Milk}: Support 4
+  - {Milk, Bread}: Support 4
+  - Here, {Milk} is NOT closed because adding Bread didn't change the support count. {Milk, Bread} captures the same information.
+- **Benefit**: Lossless compression (we don't lose any support info).
+
+### Maximal Frequent Itemset
+- An itemset is **Maximal** if none of its supersets are **frequent**.
+- *Example*:
+  - {Milk, Bread} is frequent.
+  - {Milk, Bread, Diapers} is NOT frequent.
+  - Then {Milk, Bread} is a Maximal Frequent Itemset.
+- **Benefit**: Smallest set of patterns, but we lose the exact support counts of subsets.
+
+## 2. Vertical Data Format (Eclat Algorithm)
+- **Horizontal Format** (Standard):
+  - T1: {A, B, C}
+  - T2: {A, B}
+- **Vertical Format**:
+  - Item A: {T1, T2}
+  - Item B: {T1, T2}
+  - Item C: {T1}
+- **How it works**: Instead of counting, we just intersect the Transaction ID lists.
+  - Support(A, B) = Intersection({T1, T2}, {T1, T2}) = Size is 2.
+- **Benefit**: Very fast for calculating support.