addition of unit 1 3 4 5

2025-11-24 16:55:19 +05:30
parent 8f8e35ae95
commit f8aea15aaa
24 changed files with 596 additions and 0 deletions
--- a/1/00_Index.md
+++ b/1/00_Index.md
@@ -0,0 +1,24 @@
+# Unit 1: Introduction to Data Mining
+
+Welcome to your simplified notes for Unit 1.
+
+## Table of Contents
+
+1. [[01_Introduction_to_Data_Mining|Introduction & DIKW Pyramid]]
+   - What is Data Mining?
+   - The DIKW Pyramid (Data, Information, Knowledge, Wisdom)
+2. [[02_Data_Mining_Process|The Data Mining Process]]
+   - Steps from Goal Definition to Deployment
+   - Issues in Data Mining (Privacy, Scalability)
+3. [[03_Data_Mining_Techniques|Techniques & Functionalities]]
+   - Predictive vs Descriptive Mining
+   - Classification, Regression, Clustering, Association Rules
+4. [[04_Data_Preprocessing|Data Preprocessing]]
+   - Why do we need it?
+   - Cleaning, Integration, Reduction, Transformation
+5. [[05_Data_Processing_Methods|Data Processing Methods]]
+   - Manual vs Electronic
+   - Batch, Real-time, Online Processing
+6. [[06_Data_Discretization|Data Discretization]]
+   - Binning, Histograms
+   - Concept Hierarchy
--- a/1/01_Introduction_to_Data_Mining.md
+++ b/1/01_Introduction_to_Data_Mining.md
@@ -0,0 +1,27 @@
+# Introduction to Data Mining
+
+## What is Data Mining?
+**Data Mining** is the process of digging through large amounts of raw data to find useful patterns, trends, and knowledge.
+- **Analogy**: Like mining gold from rocks. The rocks are the "raw data," and the gold is the "knowledge."
+
+### Key Definitions
+- **Data**: Raw facts and figures (e.g., sales logs, sensor readings).
+- **Mining**: Extracting something valuable.
+
+## The DIKW Pyramid
+The **DIKW** model shows how we move from raw data to wisdom.
+
+1. **Data (D)**: Raw, unprocessed facts.
+   - *Example*: Numbers like 42, 35, 50.
+2. **Information (I)**: Data that is organized and has meaning.
+   - *Example*: "These are the ages of employees."
+3. **Knowledge (K)**: Understanding gained from analysis.
+   - *Example*: "The team has a mix of young and experienced people."
+4. **Wisdom (W)**: Applying knowledge to make good decisions.
+   - *Example*: "Let's create a mentorship program to share skills."
+
+## Major Issues in Data Mining
+1. **Privacy and Security**: Mining can reveal sensitive personal info. We must protect it.
+2. **Scalability**: Can the system handle huge amounts of data (Big Data)?
+3. **Data Quality**: If data is dirty or missing, the results will be wrong ("Garbage In, Garbage Out").
+4. **Ethical Use**: Ensuring data isn't used for discrimination or bias.
--- a/1/02_Data_Mining_Process.md
+++ b/1/02_Data_Mining_Process.md
@@ -0,0 +1,24 @@
+# The Data Mining Process
+
+How do we actually do data mining? It follows a standard process (often similar to CRISP-DM).
+
+## Steps in the Process
+1. **Define the Goal**: What do you want to achieve? (e.g., Increase sales, detect fraud).
+2. **Gather Data**: Collect data from databases, logs, etc.
+3. **Cleanse Data**: Fix errors, remove duplicates, and handle missing values.
+4. **Interrogate Data**: Explore the data (charts, graphs) to find initial patterns.
+5. **Build a Model**: Use algorithms (like decision trees or regression) to find the solution.
+6. **Validate Results**: Check if the model is accurate.
+7. **Implement**: Use the insights in the real world.
+
+## Data Mining Functionalities
+Tasks are generally divided into two types:
+
+### 1. Descriptive Mining
+- Describes what is in the data.
+- Finds patterns and relationships.
+- *Examples*: Clustering, Association Rules.
+
+### 2. Predictive Mining
+- Predicts future or unknown values.
+- *Examples*: Classification, Regression, Prediction.
--- a/1/03_Data_Mining_Techniques.md
+++ b/1/03_Data_Mining_Techniques.md
@@ -0,0 +1,28 @@
+# Data Mining Techniques
+
+There are several key techniques used to mine data.
+
+## 1. Classification (Predictive)
+- **Goal**: Assign items to predefined categories (classes).
+- **Supervised Learning**: We know the categories beforehand.
+- **Example**: Is this email **Spam** or **Not Spam**?
+
+## 2. Regression (Predictive)
+- **Goal**: Predict a continuous **number**.
+- **Example**: Predicting the **price** of a house based on its size and location.
+
+## 3. Clustering (Descriptive)
+- **Goal**: Group similar items together.
+- **Unsupervised Learning**: We don't know the groups beforehand.
+- **Example**: Grouping customers into segments (e.g., "High Spenders", "Budget Shoppers").
+
+## 4. Association Rules (Descriptive)
+- **Goal**: Find relationships between items.
+- **Market Basket Analysis**: "People who buy Bread often also buy Butter."
+- **Key Terms**:
+  - **Support**: How often items appear together.
+  - **Confidence**: How likely item B is purchased if item A is purchased.
+
+## 5. Outlier Detection
+- **Goal**: Find unusual data points that don't fit the pattern.
+- **Example**: Detecting credit card fraud (a huge transaction in a usually quiet account).
--- a/1/04_Data_Preprocessing.md
+++ b/1/04_Data_Preprocessing.md
@@ -0,0 +1,31 @@
+# Data Preprocessing
+
+**Data Preprocessing** is the most important step before mining. Real-world data is often dirty, incomplete, and inconsistent.
+
+## Why Preprocess?
+- **Accuracy**: Bad data leads to bad results.
+- **Completeness**: Missing data can break algorithms.
+- **Consistency**: Different formats (e.g., "USA" vs "U.S.A.") confuse the system.
+
+## Major Steps
+
+### 1. Data Cleaning
+- **Fill Missing Values**: Use the average (mean) or a specific value.
+- **Remove Noisy Data**: Smooth out errors (binning, regression).
+- **Remove Outliers**: Delete data that doesn't make sense.
+
+### 2. Data Integration
+- Combining data from multiple sources (databases, files).
+- **Challenge**: Handling different names for the same thing (e.g., "CustID" vs "CustomerID").
+
+### 3. Data Reduction
+- Reducing the size of the data while keeping the important parts.
+- **Dimensionality Reduction**: Removing unimportant attributes.
+- **Numerosity Reduction**: Replacing raw data with smaller representations (like histograms).
+
+### 4. Data Transformation
+- Converting data into a format suitable for mining.
+- **Normalization**: Scaling data to a small range (e.g., 0 to 1).
+  - *Min-Max Normalization*
+  - *Z-Score Normalization*
+- **Discretization**: Converting continuous numbers into intervals (e.g., Age 0-10, 11-20).
--- a/1/05_Data_Processing_Methods.md
+++ b/1/05_Data_Processing_Methods.md
@@ -0,0 +1,23 @@
+# Data Processing Methods
+
+How is data actually processed by computers?
+
+## 1. Batch Processing
+- Data is collected over time and processed **all at once** (in a batch).
+- **Example**: Payroll systems (calculating salaries at the end of the month).
+- **Pros**: Efficient for large volumes.
+- **Cons**: Not immediate.
+
+## 2. Real-time Processing
+- Data is processed **immediately** as it comes in.
+- **Example**: ATM withdrawals. You need to know your balance *right now*.
+- **Pros**: Instant results.
+- **Cons**: Complex and expensive.
+
+## 3. Online Processing
+- Similar to real-time, often used for internet applications.
+- **Example**: Barcode scanning at a store checkout. The price is fetched instantly.
+
+## 4. Distributed Processing
+- Breaking a task into pieces and running them on **multiple computers** at the same time.
+- **Example**: Google Search. Many servers work together to find your result.
--- a/1/06_Data_Discretization.md
+++ b/1/06_Data_Discretization.md
@@ -0,0 +1,27 @@
+# Data Discretization
+
+**Data Discretization** is the process of converting a large number of continuous values into a smaller number of finite intervals (bins).
+
+## Why use it?
+- Makes data easier to understand.
+- Many algorithms work better with categories than infinite numbers.
+
+## Techniques
+
+### 1. Binning
+- Sorting data and dividing it into "bins".
+- **Example**: Grouping ages into [0-10], [11-20], etc.
+- Helps smooth out noise.
+
+### 2. Histogram Analysis
+- Using a bar chart (histogram) to see the distribution and decide where to split the data.
+
+### 3. Cluster Analysis
+- Using clustering (like K-Means) to group similar values, then using those groups as the intervals.
+
+## Concept Hierarchy
+- Organizing data from **low-level** concepts to **high-level** concepts.
+- **Example (Location)**:
+  - Street -> City -> State -> Country.
+- **Top-down Mapping**: General to Specific.
+- **Bottom-up Mapping**: Specific to General.
--- a/3/00_Index.md
+++ b/3/00_Index.md
@@ -0,0 +1,18 @@
+# Unit 3: Association Rule Mining
+
+Welcome to your simplified notes for Unit 3.
+
+## Table of Contents
+
+1. [[01_Association_Rule_Mining|Introduction to Association Rules]]
+   - What is it? (Market Basket Analysis)
+   - Key Terms: Support, Confidence, Frequent Itemsets
+2. [[02_Apriori_Algorithm|The Apriori Algorithm]]
+   - How it works (Join & Prune)
+   - Example Calculation
+3. [[03_FP_Growth_Algorithm|FP-Growth Algorithm]]
+   - FP-Tree Structure
+   - Why it is faster than Apriori
+4. [[04_Advanced_Pattern_Mining|Advanced Pattern Mining]]
+   - Closed and Maximal Patterns
+   - Vertical Data Format (Eclat)
--- a/3/01_Association_Rule_Mining.md
+++ b/3/01_Association_Rule_Mining.md
@@ -0,0 +1,27 @@
+# Introduction to Association Rules
+
+**Association Rule Mining** is a technique to find relationships between items in a large dataset.
+- **Classic Example**: "Market Basket Analysis" - finding what products customers buy together.
+- *Example*: "If a customer buys **Bread**, they are 80% likely to buy **Butter**."
+
+## Key Concepts
+
+### 1. Itemset
+- A collection of one or more items.
+- *Example*: `{Milk, Bread, Diapers}`
+
+### 2. Support (Frequency)
+- How often an itemset appears in the database.
+- **Formula**:
+  $$ \text{Support}(A) = \frac{\text{Transactions containing } A}{\text{Total Transactions}} $$
+- *Example*: If Milk appears in 4 out of 5 transactions, Support = 80%.
+
+### 3. Confidence (Reliability)
+- How likely item B is purchased when item A is purchased.
+- **Formula**:
+  $$ \text{Confidence}(A \to B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)} $$
+- *Example*: If Milk and Bread appear together in 3 transactions, and Milk appears in 4:
+  - Confidence(Milk -> Bread) = 3/4 = 75%.
+
+### 4. Frequent Itemset
+- An itemset that meets a minimum **Support Threshold** (e.g., must appear at least 3 times).
--- a/3/02_Apriori_Algorithm.md
+++ b/3/02_Apriori_Algorithm.md
@@ -0,0 +1,45 @@
+# The Apriori Algorithm
+
+**Apriori** is a classic algorithm to find frequent itemsets.
+
+## Key Principle (Apriori Property)
+> "All non-empty subsets of a frequent itemset must also be frequent."
+- *Meaning*: If `{Beer, Diapers}` is frequent, then `{Beer}` must be frequent and `{Diapers}` must be frequent.
+- **Reverse**: If `{Beer}` is NOT frequent, then `{Beer, Diapers}` cannot be frequent. (This helps us ignore/prune many combinations).
+
+## How it Works
+1. **Scan 1**: Count all single items. Remove those below minimum support.
+2. **Join**: Combine remaining items to make pairs (Size 2).
+3. **Prune**: Remove pairs that contain infrequent items.
+4. **Scan 2**: Count the pairs. Remove those below support.
+5. **Repeat**: Make Size 3 itemsets, Size 4, etc., until no more can be found.
+
+## Example
+**Transactions**:
+1. {Bread, Milk}
+2. {Bread, Diapers, Beer, Eggs}
+3. {Milk, Diapers, Beer, Cola}
+4. {Bread, Milk, Diapers, Beer}
+5. {Bread, Milk, Diapers, Cola}
+
+**Minimum Support = 3**
+
+1. **Count Items**:
+   - Bread: 4 (Keep)
+   - Milk: 4 (Keep)
+   - Diapers: 4 (Keep)
+   - Beer: 3 (Keep)
+   - Cola: 2 (Drop)
+   - Eggs: 1 (Drop)
+
+2. **Make Pairs**:
+   - {Bread, Milk}, {Bread, Diapers}, {Bread, Beer}...
+
+3. **Count Pairs**:
+   - {Bread, Milk}: 3 (Keep)
+   - {Bread, Diapers}: 3 (Keep)
+   - {Milk, Diapers}: 3 (Keep)
+   - {Diapers, Beer}: 3 (Keep)
+   - ...others dropped...
+
+4. **Result**: We found the frequent groups!
--- a/3/03_FP_Growth_Algorithm.md
+++ b/3/03_FP_Growth_Algorithm.md
@@ -0,0 +1,27 @@
+# FP-Growth Algorithm
+
+**FP-Growth (Frequent Pattern Growth)** is a faster and more efficient algorithm than Apriori.
+
+## Why is it better?
+- **Apriori** scans the database many times (once for size 1, once for size 2, etc.). This is slow.
+- **FP-Growth** scans the database **only twice**.
+  1. First scan: Count frequencies.
+  2. Second scan: Build the **FP-Tree**.
+
+## The FP-Tree
+An **FP-Tree** (Frequent Pattern Tree) is a compressed tree structure that stores all the transaction information.
+- More frequent items are near the root (top).
+- Less frequent items are leaves (bottom).
+
+## Steps
+1. **Count Frequencies**: Find support for all items. Drop infrequent ones.
+2. **Sort**: Sort items in each transaction by frequency (Highest to Lowest).
+3. **Build Tree**: Insert transactions into the tree. Shared items share the same path.
+4. **Mine Tree**:
+   - Start from the bottom (least frequent item).
+   - Build a "Conditional Pattern Base" (all paths leading to that item).
+   - Construct a "Conditional FP-Tree".
+   - Recursively find frequent patterns.
+
+## Divide and Conquer
+FP-Growth uses a **Divide and Conquer** strategy. It breaks the problem into smaller sub-problems (conditional trees) rather than generating millions of candidate sets like Apriori.
--- a/3/04_Advanced_Pattern_Mining.md
+++ b/3/04_Advanced_Pattern_Mining.md
@@ -0,0 +1,34 @@
+# Advanced Pattern Mining
+
+Beyond basic frequent itemsets, we have more efficient ways to represent patterns.
+
+## 1. Closed and Maximal Patterns
+Finding *all* frequent itemsets can produce too many results. We can summarize them.
+
+### Closed Frequent Itemset
+- An itemset is **Closed** if none of its supersets have the **same support**.
+- *Example*:
+  - {Milk}: Support 4
+  - {Milk, Bread}: Support 4
+  - Here, {Milk} is NOT closed because adding Bread didn't change the support count. {Milk, Bread} captures the same information.
+- **Benefit**: Lossless compression (we don't lose any support info).
+
+### Maximal Frequent Itemset
+- An itemset is **Maximal** if none of its supersets are **frequent**.
+- *Example*:
+  - {Milk, Bread} is frequent.
+  - {Milk, Bread, Diapers} is NOT frequent.
+  - Then {Milk, Bread} is a Maximal Frequent Itemset.
+- **Benefit**: Smallest set of patterns, but we lose the exact support counts of subsets.
+
+## 2. Vertical Data Format (Eclat Algorithm)
+- **Horizontal Format** (Standard):
+  - T1: {A, B, C}
+  - T2: {A, B}
+- **Vertical Format**:
+  - Item A: {T1, T2}
+  - Item B: {T1, T2}
+  - Item C: {T1}
+- **How it works**: Instead of counting, we just intersect the Transaction ID lists.
+  - Support(A, B) = Intersection({T1, T2}, {T1, T2}) = Size is 2.
+- **Benefit**: Very fast for calculating support.
--- a/4/00_Index.md
+++ b/4/00_Index.md
@@ -0,0 +1,21 @@
+# Unit 4: Classification and Prediction
+
+Welcome to your simplified notes for Unit 4.
+
+## Table of Contents
+
+1. [[01_Classification_Basics|Classification Basics]]
+   - Classification vs Prediction
+   - Training vs Testing
+2. [[02_Decision_Trees|Decision Tree Induction]]
+   - How Trees work
+   - Attribute Selection (Info Gain, Gini Index)
+   - Pruning
+3. [[03_Bayesian_Classification|Bayesian Classification]]
+   - Bayes' Theorem
+   - Naive Bayes Classifier
+4. [[04_KNN_Algorithm|K-Nearest Neighbors (KNN)]]
+   - Lazy Learning
+   - Distance Measures
+5. [[05_Rule_Based_Classification|Rule-Based Classification]]
+   - IF-THEN Rules
--- a/4/01_Classification_Basics.md
+++ b/4/01_Classification_Basics.md
@@ -0,0 +1,22 @@
+# Classification Basics
+
+## What is Classification?
+**Classification** is the process of predicting the **class label** of a data item.
+- **Goal**: To assign a category to a new item based on past data.
+- **Example**:
+  - Input: A bank loan application.
+  - Output Class: "Safe" or "Risky".
+
+## Classification vs Prediction
+- **Classification**: Predicts a **category** (Discrete value).
+  - *Example*: Yes/No, Red/Blue/Green.
+- **Prediction (Regression)**: Predicts a **number** (Continuous value).
+  - *Example*: Predicting the price of a house ($500k, $505k...).
+
+## The Process
+1. **Training Phase (Learning)**:
+   - The algorithm learns from a "Training Set" where the correct answers (labels) are known.
+   - It builds a **Model** (e.g., a Decision Tree).
+2. **Testing Phase (Classification)**:
+   - The model is tested on new, unseen data ("Test Set").
+   - We check the **Accuracy**: Percentage of correct predictions.
--- a/4/02_Decision_Trees.md
+++ b/4/02_Decision_Trees.md
@@ -0,0 +1,30 @@
+# Decision Tree Induction
+
+A **Decision Tree** is a flowchart-like structure used for classification.
+
+## Structure
+- **Root Node**: The top question (e.g., "Is it raining?").
+- **Branch**: The answer (e.g., "Yes" or "No").
+- **Leaf Node**: The final decision/class (e.g., "Play Football" or "Stay Inside").
+
+## How to Build a Tree?
+We need to decide which attribute to split on first. We use **Attribute Selection Measures**:
+
+### 1. Information Gain (Used in ID3 Algorithm)
+- Measures how much "uncertainty" (Entropy) is reduced by splitting on an attribute.
+- We choose the attribute with the **Highest Information Gain**.
+- **Entropy**: A measure of randomness.
+  - High Entropy = Messy/Mixed data (50% Yes, 50% No).
+  - Low Entropy = Pure data (100% Yes).
+
+### 2. Gain Ratio (Used in C4.5)
+- An improvement over Information Gain. It handles attributes with many values (like "Date") better.
+
+### 3. Gini Index (Used in CART)
+- Measures "Impurity". We want to minimize the Gini Index.
+
+## Tree Pruning
+Trees can become too complex and memorize the training data (**Overfitting**).
+- **Pruning**: Cutting off weak branches to make the tree simpler and better at generalizing.
+- **Pre-pruning**: Stop building early.
+- **Post-pruning**: Build the full tree, then cut branches.
--- a/4/03_Bayesian_Classification.md
+++ b/4/03_Bayesian_Classification.md
@@ -0,0 +1,20 @@
+# Bayesian Classification
+
+**Bayesian Classifiers** are based on probability (Bayes' Theorem). They predict the likelihood that a tuple belongs to a class.
+
+## Bayes' Theorem
+$$ P(H|X) = \frac{P(X|H) \cdot P(H)}{P(X)} $$
+- **P(H|X)**: Posterior Probability (Probability of Hypothesis H given Evidence X).
+- **P(H)**: Prior Probability (Probability of H being true generally).
+- **P(X|H)**: Likelihood (Probability of seeing Evidence X if H is true).
+- **P(X)**: Evidence (Probability of X occurring).
+
+## Naive Bayes Classifier
+- **"Naive"**: It assumes that all attributes are **independent** of each other.
+  - *Example*: It assumes "Income" and "Age" don't affect each other, which simplifies the math.
+- **Pros**: Very fast and effective for large datasets (like spam filtering).
+- **Cons**: The independence assumption is often not true in real life.
+
+## Bayesian Belief Networks (BBN)
+- Unlike Naive Bayes, BBNs **allow** dependencies between variables.
+- They use a graph structure (DAG) to show which variables affect others.
--- a/4/04_KNN_Algorithm.md
+++ b/4/04_KNN_Algorithm.md
@@ -0,0 +1,25 @@
+# K-Nearest Neighbors (KNN)
+
+**KNN** is a simple, "Lazy" learning algorithm.
+
+## How it Works
+1. Store all training data.
+2. When a new item arrives, find the **K** closest items (neighbors) to it.
+3. Check the class of those neighbors.
+4. Assign the most common class to the new item.
+
+## Key Concepts
+- **Lazy Learner**: It doesn't build a model during training. It waits until it needs to classify.
+- **Distance Measure**: How do we measure "closeness"?
+  - **Euclidean Distance**: Straight line distance (most common).
+  - **Manhattan Distance**: Grid-like distance.
+- **Choosing K**:
+  - If K is too small (e.g., K=1), it's sensitive to noise.
+  - If K is too large, it might include points from other classes.
+  - Usually, K is an odd number (like 3, 5) to avoid ties.
+
+## Example
+- New Point: Green Circle.
+- K = 3.
+- Neighbors: 2 Red Triangles, 1 Blue Square.
+- Result: Green Circle is classified as **Red Triangle**.
--- a/4/05_Rule_Based_Classification.md
+++ b/4/05_Rule_Based_Classification.md
@@ -0,0 +1,18 @@
+# Rule-Based Classification
+
+**Rule-Based Classifiers** use a set of **IF-THEN** rules to classify data.
+
+## Structure
+- **Rule**: `IF (Condition) THEN (Class)`
+- *Example*:
+  - `IF (Age = Youth) AND (Student = Yes) THEN (Buys_Computer = Yes)`
+
+## Extracting Rules from Decision Trees
+- We can easily turn a decision tree into rules.
+- Each path from the **Root** to a **Leaf** becomes one rule.
+- The conditions along the path become the `IF` part (joined by AND).
+- The leaf node becomes the `THEN` part.
+
+## Advantages
+- Easy for humans to understand.
+- Can be created directly or from other models (like trees).
--- a/5/00_Index.md
+++ b/5/00_Index.md
@@ -0,0 +1,19 @@
+# Unit 5: Advanced Data Mining Techniques
+
+Welcome to your simplified notes for Unit 5.
+
+## Table of Contents
+
+1. [[01_Ubiquitous_Data_Mining|Ubiquitous & Invisible Data Mining]]
+   - Mining everywhere (IoT, Mobile)
+   - Invisible Mining (Background processes)
+2. [[02_Web_Mining|Web Mining]]
+   - Content, Structure, and Usage Mining
+3. [[03_Spatial_and_Temporal_Mining|Spatial & Temporal Mining]]
+   - Mining location data (Maps/GIS)
+   - Mining time-based data (Trends)
+4. [[04_Other_Mining_Types|Other Mining Types]]
+   - Text, Visual, Audio, and Process Mining
+5. [[05_Applications_and_Impact|Applications & Social Impact]]
+   - Real-world uses (Healthcare, Retail)
+   - Privacy and Ethical concerns
--- a/5/01_Ubiquitous_Data_Mining.md
+++ b/5/01_Ubiquitous_Data_Mining.md
@@ -0,0 +1,30 @@
+# Ubiquitous and Invisible Data Mining
+
+## Ubiquitous Data Mining (UDM)
+**"Ubiquitous"** means existing everywhere.
+- **Definition**: Mining data from everyday objects and devices (Smartphones, IoT, Wearables) in real-time.
+- **Goal**: To provide insights anytime, anywhere, without you asking for it.
+- **Characteristics**:
+  - **Mobile**: Uses GPS and sensors.
+  - **Context-Aware**: Knows where you are and what time it is.
+  - **Real-Time**: Processes data instantly.
+
+### Examples
+- **Smartphones**: Google Maps predicting traffic.
+- **Wearables**: Smartwatches tracking your heart rate.
+- **Smart Homes**: Alexa learning your voice commands.
+
+## Invisible Data Mining
+- **Definition**: Mining that happens **silently** in the background. You don't see it happening.
+- **Why "Invisible"?**: It is embedded in apps and systems. You only see the result (like a recommendation).
+- **Examples**:
+  - **Amazon**: "People who bought this also bought..."
+  - **Google Search**: Auto-completing your sentence.
+  - **Banks**: Detecting fraud without you knowing.
+
+### Difference
+| Feature | Ubiquitous Mining | Invisible Mining |
+|---|---|---|
+| **Focus** | Mining **everywhere** (IoT, Mobile) | Mining **hidden** from user |
+| **Awareness** | You might know it's happening (e.g., wearing a watch) | You usually don't know |
+| **Key Tech** | Sensors, Mobile Devices | Software Algorithms, Background Processes |
--- a/5/02_Web_Mining.md
+++ b/5/02_Web_Mining.md
@@ -0,0 +1,20 @@
+# Web Mining
+
+**Web Mining** is using data mining techniques to discover useful information from the World Wide Web.
+
+## Types of Web Mining
+
+### 1. Web Content Mining
+- **What**: Mining the **actual content** of web pages.
+- **Data**: Text, images, audio, video.
+- **Example**: Analyzing reviews on Amazon to see if people like a product (Sentiment Analysis).
+
+### 2. Web Structure Mining
+- **What**: Mining the **links** (hyperlinks) between pages.
+- **Goal**: To find important pages (Authorities) and pages that link to many others (Hubs).
+- **Example**: Google's **PageRank** algorithm uses this to rank search results.
+
+### 3. Web Usage Mining
+- **What**: Mining **user activity** logs.
+- **Data**: Server logs, browser history, clicks.
+- **Example**: Analyzing which pages users visit most often and where they leave the site.
--- a/5/03_Spatial_and_Temporal_Mining.md
+++ b/5/03_Spatial_and_Temporal_Mining.md
@@ -0,0 +1,17 @@
+# Spatial and Temporal Data Mining
+
+## Spatial Data Mining
+- **Spatial Data**: Data related to **location** or geography (Maps, GPS).
+- **Goal**: Finding patterns in space.
+- **Tools**: GIS (Geographic Information Systems).
+- **Examples**:
+  - Finding the best location for a new store.
+  - Tracking the spread of a disease on a map.
+
+## Temporal Data Mining
+- **Temporal Data**: Data related to **time**.
+- **Goal**: Finding patterns that change over time (Trends).
+- **Tasks**:
+  - **Trend Analysis**: Is the stock market going up or down?
+  - **Sequence Analysis**: "If event A happens, does event B follow?"
+- **Example**: Analyzing weather patterns over 10 years to predict climate change.
--- a/5/04_Other_Mining_Types.md
+++ b/5/04_Other_Mining_Types.md
@@ -0,0 +1,17 @@
+# Other Types of Data Mining
+
+## 1. Text Mining
+- **Data**: Unstructured text (Emails, Tweets, Documents).
+- **Technique**: Natural Language Processing (NLP).
+- **Goal**: To understand meaning, sentiment, and topics.
+- **Example**: Classifying customer feedback as "Angry" or "Happy".
+
+## 2. Visual and Audio Mining
+- **Data**: Images, Videos, Sound.
+- **Goal**: To find patterns in visual or audio data.
+- **Example**: Face recognition in photos, or detecting keywords in a voice recording.
+
+## 3. Process Mining
+- **Data**: Event logs from business systems (ERP, CRM).
+- **Goal**: To see how a business process *actually* works vs how it *should* work.
+- **Example**: Finding out why it takes 5 days to approve a loan instead of 2.
--- a/5/05_Applications_and_Impact.md
+++ b/5/05_Applications_and_Impact.md
@@ -0,0 +1,22 @@
+# Applications and Social Impact
+
+## Applications of Data Mining
+Data mining is used everywhere!
+
+1. **Healthcare**: Predicting diseases, finding side effects of drugs.
+2. **Retail (Market Basket Analysis)**: Placing Bread near Butter to increase sales.
+3. **Finance**: Detecting credit card fraud, approving loans.
+4. **Education**: Tracking student performance to help them improve.
+5. **Crime**: Identifying crime hotspots and predicting criminal behavior.
+
+## Social Impact and Issues
+
+### Positive Impact
+- **Convenience**: Personalized recommendations save time.
+- **Safety**: Fraud detection and medical diagnosis save money and lives.
+
+### Negative Impact (Ethical Issues)
+1. **Privacy Invasion**: Companies know too much about you.
+2. **Discrimination**: Profiling can lead to unfair treatment (e.g., denying loans based on where you live).
+3. **Security**: Large databases can be hacked (Data Breaches).
+4. **Manipulation**: Targeted ads can influence your behavior or political views.