addition of unit 1 3 4 5
This commit is contained in:
18
unit 3/00_Index.md
Normal file
18
unit 3/00_Index.md
Normal file
@@ -0,0 +1,18 @@
|
||||
# Unit 3: Association Rule Mining
|
||||
|
||||
Welcome to your simplified notes for Unit 3.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [[01_Association_Rule_Mining|Introduction to Association Rules]]
|
||||
- What is it? (Market Basket Analysis)
|
||||
- Key Terms: Support, Confidence, Frequent Itemsets
|
||||
2. [[02_Apriori_Algorithm|The Apriori Algorithm]]
|
||||
- How it works (Join & Prune)
|
||||
- Example Calculation
|
||||
3. [[03_FP_Growth_Algorithm|FP-Growth Algorithm]]
|
||||
- FP-Tree Structure
|
||||
- Why it is faster than Apriori
|
||||
4. [[04_Advanced_Pattern_Mining|Advanced Pattern Mining]]
|
||||
- Closed and Maximal Patterns
|
||||
- Vertical Data Format (Eclat)
|
||||
27
unit 3/01_Association_Rule_Mining.md
Normal file
27
unit 3/01_Association_Rule_Mining.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Introduction to Association Rules
|
||||
|
||||
**Association Rule Mining** is a technique to find relationships between items in a large dataset.
|
||||
- **Classic Example**: "Market Basket Analysis" - finding what products customers buy together.
|
||||
- *Example*: "If a customer buys **Bread**, they are 80% likely to buy **Butter**."
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### 1. Itemset
|
||||
- A collection of one or more items.
|
||||
- *Example*: `{Milk, Bread, Diapers}`
|
||||
|
||||
### 2. Support (Frequency)
|
||||
- How often an itemset appears in the database.
|
||||
- **Formula**:
|
||||
$$ \text{Support}(A) = \frac{\text{Transactions containing } A}{\text{Total Transactions}} $$
|
||||
- *Example*: If Milk appears in 4 out of 5 transactions, Support = 80%.
|
||||
|
||||
### 3. Confidence (Reliability)
|
||||
- How likely item B is purchased when item A is purchased.
|
||||
- **Formula**:
|
||||
$$ \text{Confidence}(A \to B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)} $$
|
||||
- *Example*: If Milk and Bread appear together in 3 transactions, and Milk appears in 4:
|
||||
- Confidence(Milk -> Bread) = 3/4 = 75%.
|
||||
|
||||
### 4. Frequent Itemset
|
||||
- An itemset that meets a minimum **Support Threshold** (e.g., must appear at least 3 times).
|
||||
45
unit 3/02_Apriori_Algorithm.md
Normal file
45
unit 3/02_Apriori_Algorithm.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# The Apriori Algorithm
|
||||
|
||||
**Apriori** is a classic algorithm to find frequent itemsets.
|
||||
|
||||
## Key Principle (Apriori Property)
|
||||
> "All non-empty subsets of a frequent itemset must also be frequent."
|
||||
- *Meaning*: If `{Beer, Diapers}` is frequent, then `{Beer}` must be frequent and `{Diapers}` must be frequent.
|
||||
- **Reverse**: If `{Beer}` is NOT frequent, then `{Beer, Diapers}` cannot be frequent. (This helps us ignore/prune many combinations).
|
||||
|
||||
## How it Works
|
||||
1. **Scan 1**: Count all single items. Remove those below minimum support.
|
||||
2. **Join**: Combine remaining items to make pairs (Size 2).
|
||||
3. **Prune**: Remove pairs that contain infrequent items.
|
||||
4. **Scan 2**: Count the pairs. Remove those below support.
|
||||
5. **Repeat**: Make Size 3 itemsets, Size 4, etc., until no more can be found.
|
||||
|
||||
## Example
|
||||
**Transactions**:
|
||||
1. {Bread, Milk}
|
||||
2. {Bread, Diapers, Beer, Eggs}
|
||||
3. {Milk, Diapers, Beer, Cola}
|
||||
4. {Bread, Milk, Diapers, Beer}
|
||||
5. {Bread, Milk, Diapers, Cola}
|
||||
|
||||
**Minimum Support = 3**
|
||||
|
||||
1. **Count Items**:
|
||||
- Bread: 4 (Keep)
|
||||
- Milk: 4 (Keep)
|
||||
- Diapers: 4 (Keep)
|
||||
- Beer: 3 (Keep)
|
||||
- Cola: 2 (Drop)
|
||||
- Eggs: 1 (Drop)
|
||||
|
||||
2. **Make Pairs**:
|
||||
- {Bread, Milk}, {Bread, Diapers}, {Bread, Beer}...
|
||||
|
||||
3. **Count Pairs**:
|
||||
- {Bread, Milk}: 3 (Keep)
|
||||
- {Bread, Diapers}: 3 (Keep)
|
||||
- {Milk, Diapers}: 3 (Keep)
|
||||
- {Diapers, Beer}: 3 (Keep)
|
||||
- ...others dropped...
|
||||
|
||||
4. **Result**: We found the frequent groups!
|
||||
27
unit 3/03_FP_Growth_Algorithm.md
Normal file
27
unit 3/03_FP_Growth_Algorithm.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# FP-Growth Algorithm
|
||||
|
||||
**FP-Growth (Frequent Pattern Growth)** is a faster and more efficient algorithm than Apriori.
|
||||
|
||||
## Why is it better?
|
||||
- **Apriori** scans the database many times (once for size 1, once for size 2, etc.). This is slow.
|
||||
- **FP-Growth** scans the database **only twice**.
|
||||
1. First scan: Count frequencies.
|
||||
2. Second scan: Build the **FP-Tree**.
|
||||
|
||||
## The FP-Tree
|
||||
An **FP-Tree** (Frequent Pattern Tree) is a compressed tree structure that stores all the transaction information.
|
||||
- More frequent items are near the root (top).
|
||||
- Less frequent items are leaves (bottom).
|
||||
|
||||
## Steps
|
||||
1. **Count Frequencies**: Find support for all items. Drop infrequent ones.
|
||||
2. **Sort**: Sort items in each transaction by frequency (Highest to Lowest).
|
||||
3. **Build Tree**: Insert transactions into the tree. Shared items share the same path.
|
||||
4. **Mine Tree**:
|
||||
- Start from the bottom (least frequent item).
|
||||
- Build a "Conditional Pattern Base" (all paths leading to that item).
|
||||
- Construct a "Conditional FP-Tree".
|
||||
- Recursively find frequent patterns.
|
||||
|
||||
## Divide and Conquer
|
||||
FP-Growth uses a **Divide and Conquer** strategy. It breaks the problem into smaller sub-problems (conditional trees) rather than generating millions of candidate sets like Apriori.
|
||||
34
unit 3/04_Advanced_Pattern_Mining.md
Normal file
34
unit 3/04_Advanced_Pattern_Mining.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# Advanced Pattern Mining
|
||||
|
||||
Beyond basic frequent itemsets, we have more efficient ways to represent patterns.
|
||||
|
||||
## 1. Closed and Maximal Patterns
|
||||
Finding *all* frequent itemsets can produce too many results. We can summarize them.
|
||||
|
||||
### Closed Frequent Itemset
|
||||
- An itemset is **Closed** if none of its supersets have the **same support**.
|
||||
- *Example*:
|
||||
- {Milk}: Support 4
|
||||
- {Milk, Bread}: Support 4
|
||||
- Here, {Milk} is NOT closed because adding Bread didn't change the support count. {Milk, Bread} captures the same information.
|
||||
- **Benefit**: Lossless compression (we don't lose any support info).
|
||||
|
||||
### Maximal Frequent Itemset
|
||||
- An itemset is **Maximal** if none of its supersets are **frequent**.
|
||||
- *Example*:
|
||||
- {Milk, Bread} is frequent.
|
||||
- {Milk, Bread, Diapers} is NOT frequent.
|
||||
- Then {Milk, Bread} is a Maximal Frequent Itemset.
|
||||
- **Benefit**: Smallest set of patterns, but we lose the exact support counts of subsets.
|
||||
|
||||
## 2. Vertical Data Format (Eclat Algorithm)
|
||||
- **Horizontal Format** (Standard):
|
||||
- T1: {A, B, C}
|
||||
- T2: {A, B}
|
||||
- **Vertical Format**:
|
||||
- Item A: {T1, T2}
|
||||
- Item B: {T1, T2}
|
||||
- Item C: {T1}
|
||||
- **How it works**: Instead of counting, we just intersect the Transaction ID lists.
|
||||
- Support(A, B) = Intersection({T1, T2}, {T1, T2}) = Size is 2.
|
||||
- **Benefit**: Very fast for calculating support.
|
||||
Reference in New Issue
Block a user