# The Apriori Algorithm **Apriori** is a classic algorithm to find frequent itemsets. ## Key Principle (Apriori Property) > "All non-empty subsets of a frequent itemset must also be frequent." - *Meaning*: If `{Beer, Diapers}` is frequent, then `{Beer}` must be frequent and `{Diapers}` must be frequent. - **Reverse**: If `{Beer}` is NOT frequent, then `{Beer, Diapers}` cannot be frequent. (This helps us ignore/prune many combinations). ## How it Works 1. **Scan 1**: Count all single items. Remove those below minimum support. 2. **Join**: Combine remaining items to make pairs (Size 2). 3. **Prune**: Remove pairs that contain infrequent items. 4. **Scan 2**: Count the pairs. Remove those below support. 5. **Repeat**: Make Size 3 itemsets, Size 4, etc., until no more can be found. ## Example **Transactions**: 1. {Bread, Milk} 2. {Bread, Diapers, Beer, Eggs} 3. {Milk, Diapers, Beer, Cola} 4. {Bread, Milk, Diapers, Beer} 5. {Bread, Milk, Diapers, Cola} **Minimum Support = 3** 1. **Count Items**: - Bread: 4 (Keep) - Milk: 4 (Keep) - Diapers: 4 (Keep) - Beer: 3 (Keep) - Cola: 2 (Drop) - Eggs: 1 (Drop) 2. **Make Pairs**: - {Bread, Milk}, {Bread, Diapers}, {Bread, Beer}... 3. **Count Pairs**: - {Bread, Milk}: 3 (Keep) - {Bread, Diapers}: 3 (Keep) - {Milk, Diapers}: 3 (Keep) - {Diapers, Beer}: 3 (Keep) - ...others dropped... 4. **Result**: We found the frequent groups!