unit 2 added

This commit is contained in:
Akshat Mehta
2025-11-24 15:26:41 +05:30
commit 8f8e35ae95
9 changed files with 326 additions and 0 deletions

View File

@@ -0,0 +1,41 @@
# K-Nearest Neighbors (KNN)
**KNN** is a simple algorithm used for classification and regression.
## How it Works
1. Store all training data.
2. When a new data point comes in, find the **K** closest points (neighbors) to it.
3. **Vote**: Assign the class that is most common among those K neighbors.
## Key Characteristics
- **Instance-based Learning**: Uses training instances directly to predict.
- **Lazy Learning**: It doesn't "learn" a model during training. It waits until a prediction is needed.
- **Non-Parametric**: It assumes nothing about the data structure.
## Choosing K
- **Small K**: Can be noisy and overfit (sensitive to outliers).
- **Large K**: Can be biased and miss patterns.
- **Tip**: Choose an **odd number** for K to avoid ties in voting.
## Distance Measures
How do we measure "closeness"?
### 1. Euclidean Distance
- The straight-line distance between two points.
- Used for numeric data.
- Formula: `sqrt((x2-x1)^2 + (y2-y1)^2)`
### 2. Manhattan Distance
- The distance if you can only move along a grid (like city blocks).
- Formula: `|x2-x1| + |y2-y1|`
### 3. Minkowski Distance
- A generalized form of Euclidean and Manhattan.
### 4. Chebyshev Distance
- The greatest difference along any coordinate dimension.
## Data Scaling
Since KNN uses distance, it is very sensitive to the scale of data.
- **Example**: If one feature ranges 0-1 and another 0-1000, the second one will dominate the distance.
- **Solution**: **Normalize** or **Standardize** data so all features contribute equally.