Files
DMCT-NOTES/unit 3/04_Advanced_Pattern_Mining.md
2025-11-24 16:55:19 +05:30

1.3 KiB

Advanced Pattern Mining

Beyond basic frequent itemsets, we have more efficient ways to represent patterns.

1. Closed and Maximal Patterns

Finding all frequent itemsets can produce too many results. We can summarize them.

Closed Frequent Itemset

  • An itemset is Closed if none of its supersets have the same support.
  • Example:
    • {Milk}: Support 4
    • {Milk, Bread}: Support 4
    • Here, {Milk} is NOT closed because adding Bread didn't change the support count. {Milk, Bread} captures the same information.
  • Benefit: Lossless compression (we don't lose any support info).

Maximal Frequent Itemset

  • An itemset is Maximal if none of its supersets are frequent.
  • Example:
    • {Milk, Bread} is frequent.
    • {Milk, Bread, Diapers} is NOT frequent.
    • Then {Milk, Bread} is a Maximal Frequent Itemset.
  • Benefit: Smallest set of patterns, but we lose the exact support counts of subsets.

2. Vertical Data Format (Eclat Algorithm)

  • Horizontal Format (Standard):
    • T1: {A, B, C}
    • T2: {A, B}
  • Vertical Format:
    • Item A: {T1, T2}
    • Item B: {T1, T2}
    • Item C: {T1}
  • How it works: Instead of counting, we just intersect the Transaction ID lists.
    • Support(A, B) = Intersection({T1, T2}, {T1, T2}) = Size is 2.
  • Benefit: Very fast for calculating support.