unit 2 added
This commit is contained in:
41
unit 2/06_KNN_Algorithm.md
Normal file
41
unit 2/06_KNN_Algorithm.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# K-Nearest Neighbors (KNN)
|
||||
|
||||
**KNN** is a simple algorithm used for classification and regression.
|
||||
|
||||
## How it Works
|
||||
1. Store all training data.
|
||||
2. When a new data point comes in, find the **K** closest points (neighbors) to it.
|
||||
3. **Vote**: Assign the class that is most common among those K neighbors.
|
||||
|
||||
## Key Characteristics
|
||||
- **Instance-based Learning**: Uses training instances directly to predict.
|
||||
- **Lazy Learning**: It doesn't "learn" a model during training. It waits until a prediction is needed.
|
||||
- **Non-Parametric**: It assumes nothing about the data structure.
|
||||
|
||||
## Choosing K
|
||||
- **Small K**: Can be noisy and overfit (sensitive to outliers).
|
||||
- **Large K**: Can be biased and miss patterns.
|
||||
- **Tip**: Choose an **odd number** for K to avoid ties in voting.
|
||||
|
||||
## Distance Measures
|
||||
How do we measure "closeness"?
|
||||
|
||||
### 1. Euclidean Distance
|
||||
- The straight-line distance between two points.
|
||||
- Used for numeric data.
|
||||
- Formula: `sqrt((x2-x1)^2 + (y2-y1)^2)`
|
||||
|
||||
### 2. Manhattan Distance
|
||||
- The distance if you can only move along a grid (like city blocks).
|
||||
- Formula: `|x2-x1| + |y2-y1|`
|
||||
|
||||
### 3. Minkowski Distance
|
||||
- A generalized form of Euclidean and Manhattan.
|
||||
|
||||
### 4. Chebyshev Distance
|
||||
- The greatest difference along any coordinate dimension.
|
||||
|
||||
## Data Scaling
|
||||
Since KNN uses distance, it is very sensitive to the scale of data.
|
||||
- **Example**: If one feature ranges 0-1 and another 0-1000, the second one will dominate the distance.
|
||||
- **Solution**: **Normalize** or **Standardize** data so all features contribute equally.
|
||||
Reference in New Issue
Block a user