addition of unit 1 3 4 5

2025-11-24 16:55:19 +05:30
parent 8f8e35ae95
commit f8aea15aaa
24 changed files with 596 additions and 0 deletions
--- a/4/00_Index.md
+++ b/4/00_Index.md
@@ -0,0 +1,21 @@
+# Unit 4: Classification and Prediction
+
+Welcome to your simplified notes for Unit 4.
+
+## Table of Contents
+
+1. [[01_Classification_Basics|Classification Basics]]
+   - Classification vs Prediction
+   - Training vs Testing
+2. [[02_Decision_Trees|Decision Tree Induction]]
+   - How Trees work
+   - Attribute Selection (Info Gain, Gini Index)
+   - Pruning
+3. [[03_Bayesian_Classification|Bayesian Classification]]
+   - Bayes' Theorem
+   - Naive Bayes Classifier
+4. [[04_KNN_Algorithm|K-Nearest Neighbors (KNN)]]
+   - Lazy Learning
+   - Distance Measures
+5. [[05_Rule_Based_Classification|Rule-Based Classification]]
+   - IF-THEN Rules
--- a/4/01_Classification_Basics.md
+++ b/4/01_Classification_Basics.md
@@ -0,0 +1,22 @@
+# Classification Basics
+
+## What is Classification?
+**Classification** is the process of predicting the **class label** of a data item.
+- **Goal**: To assign a category to a new item based on past data.
+- **Example**:
+  - Input: A bank loan application.
+  - Output Class: "Safe" or "Risky".
+
+## Classification vs Prediction
+- **Classification**: Predicts a **category** (Discrete value).
+  - *Example*: Yes/No, Red/Blue/Green.
+- **Prediction (Regression)**: Predicts a **number** (Continuous value).
+  - *Example*: Predicting the price of a house ($500k, $505k...).
+
+## The Process
+1. **Training Phase (Learning)**:
+   - The algorithm learns from a "Training Set" where the correct answers (labels) are known.
+   - It builds a **Model** (e.g., a Decision Tree).
+2. **Testing Phase (Classification)**:
+   - The model is tested on new, unseen data ("Test Set").
+   - We check the **Accuracy**: Percentage of correct predictions.
--- a/4/02_Decision_Trees.md
+++ b/4/02_Decision_Trees.md
@@ -0,0 +1,30 @@
+# Decision Tree Induction
+
+A **Decision Tree** is a flowchart-like structure used for classification.
+
+## Structure
+- **Root Node**: The top question (e.g., "Is it raining?").
+- **Branch**: The answer (e.g., "Yes" or "No").
+- **Leaf Node**: The final decision/class (e.g., "Play Football" or "Stay Inside").
+
+## How to Build a Tree?
+We need to decide which attribute to split on first. We use **Attribute Selection Measures**:
+
+### 1. Information Gain (Used in ID3 Algorithm)
+- Measures how much "uncertainty" (Entropy) is reduced by splitting on an attribute.
+- We choose the attribute with the **Highest Information Gain**.
+- **Entropy**: A measure of randomness.
+  - High Entropy = Messy/Mixed data (50% Yes, 50% No).
+  - Low Entropy = Pure data (100% Yes).
+
+### 2. Gain Ratio (Used in C4.5)
+- An improvement over Information Gain. It handles attributes with many values (like "Date") better.
+
+### 3. Gini Index (Used in CART)
+- Measures "Impurity". We want to minimize the Gini Index.
+
+## Tree Pruning
+Trees can become too complex and memorize the training data (**Overfitting**).
+- **Pruning**: Cutting off weak branches to make the tree simpler and better at generalizing.
+- **Pre-pruning**: Stop building early.
+- **Post-pruning**: Build the full tree, then cut branches.
--- a/4/03_Bayesian_Classification.md
+++ b/4/03_Bayesian_Classification.md
@@ -0,0 +1,20 @@
+# Bayesian Classification
+
+**Bayesian Classifiers** are based on probability (Bayes' Theorem). They predict the likelihood that a tuple belongs to a class.
+
+## Bayes' Theorem
+$$ P(H|X) = \frac{P(X|H) \cdot P(H)}{P(X)} $$
+- **P(H|X)**: Posterior Probability (Probability of Hypothesis H given Evidence X).
+- **P(H)**: Prior Probability (Probability of H being true generally).
+- **P(X|H)**: Likelihood (Probability of seeing Evidence X if H is true).
+- **P(X)**: Evidence (Probability of X occurring).
+
+## Naive Bayes Classifier
+- **"Naive"**: It assumes that all attributes are **independent** of each other.
+  - *Example*: It assumes "Income" and "Age" don't affect each other, which simplifies the math.
+- **Pros**: Very fast and effective for large datasets (like spam filtering).
+- **Cons**: The independence assumption is often not true in real life.
+
+## Bayesian Belief Networks (BBN)
+- Unlike Naive Bayes, BBNs **allow** dependencies between variables.
+- They use a graph structure (DAG) to show which variables affect others.
--- a/4/04_KNN_Algorithm.md
+++ b/4/04_KNN_Algorithm.md
@@ -0,0 +1,25 @@
+# K-Nearest Neighbors (KNN)
+
+**KNN** is a simple, "Lazy" learning algorithm.
+
+## How it Works
+1. Store all training data.
+2. When a new item arrives, find the **K** closest items (neighbors) to it.
+3. Check the class of those neighbors.
+4. Assign the most common class to the new item.
+
+## Key Concepts
+- **Lazy Learner**: It doesn't build a model during training. It waits until it needs to classify.
+- **Distance Measure**: How do we measure "closeness"?
+  - **Euclidean Distance**: Straight line distance (most common).
+  - **Manhattan Distance**: Grid-like distance.
+- **Choosing K**:
+  - If K is too small (e.g., K=1), it's sensitive to noise.
+  - If K is too large, it might include points from other classes.
+  - Usually, K is an odd number (like 3, 5) to avoid ties.
+
+## Example
+- New Point: Green Circle.
+- K = 3.
+- Neighbors: 2 Red Triangles, 1 Blue Square.
+- Result: Green Circle is classified as **Red Triangle**.
--- a/4/05_Rule_Based_Classification.md
+++ b/4/05_Rule_Based_Classification.md
@@ -0,0 +1,18 @@
+# Rule-Based Classification
+
+**Rule-Based Classifiers** use a set of **IF-THEN** rules to classify data.
+
+## Structure
+- **Rule**: `IF (Condition) THEN (Class)`
+- *Example*:
+  - `IF (Age = Youth) AND (Student = Yes) THEN (Buys_Computer = Yes)`
+
+## Extracting Rules from Decision Trees
+- We can easily turn a decision tree into rules.
+- Each path from the **Root** to a **Leaf** becomes one rule.
+- The conditions along the path become the `IF` part (joined by AND).
+- The leaf node becomes the `THEN` part.
+
+## Advantages
+- Easy for humans to understand.
+- Can be created directly or from other models (like trees).