unit 2 added

2025-11-24 15:26:41 +05:30
commit 8f8e35ae95
9 changed files with 326 additions and 0 deletions
--- a/2/06_KNN_Algorithm.md
+++ b/2/06_KNN_Algorithm.md
@@ -0,0 +1,41 @@
+# K-Nearest Neighbors (KNN)
+
+**KNN** is a simple algorithm used for classification and regression.
+
+## How it Works
+1. Store all training data.
+2. When a new data point comes in, find the **K** closest points (neighbors) to it.
+3. **Vote**: Assign the class that is most common among those K neighbors.
+
+## Key Characteristics
+- **Instance-based Learning**: Uses training instances directly to predict.
+- **Lazy Learning**: It doesn't "learn" a model during training. It waits until a prediction is needed.
+- **Non-Parametric**: It assumes nothing about the data structure.
+
+## Choosing K
+- **Small K**: Can be noisy and overfit (sensitive to outliers).
+- **Large K**: Can be biased and miss patterns.
+- **Tip**: Choose an **odd number** for K to avoid ties in voting.
+
+## Distance Measures
+How do we measure "closeness"?
+
+### 1. Euclidean Distance
+- The straight-line distance between two points.
+- Used for numeric data.
+- Formula: `sqrt((x2-x1)^2 + (y2-y1)^2)`
+
+### 2. Manhattan Distance
+- The distance if you can only move along a grid (like city blocks).
+- Formula: `|x2-x1| + |y2-y1|`
+
+### 3. Minkowski Distance
+- A generalized form of Euclidean and Manhattan.
+
+### 4. Chebyshev Distance
+- The greatest difference along any coordinate dimension.
+
+## Data Scaling
+Since KNN uses distance, it is very sensitive to the scale of data.
+- **Example**: If one feature ranges 0-1 and another 0-1000, the second one will dominate the distance.
+- **Solution**: **Normalize** or **Standardize** data so all features contribute equally.