Files
DMCT-NOTES/unit 2/03_Logistic_Regression.md
Akshat Mehta 8f8e35ae95 unit 2 added
2025-11-24 15:26:41 +05:30

36 lines
1.6 KiB
Markdown

# Logistic Regression
Logistic Regression is used for **classification** problems (predicting categories), even though it has "Regression" in its name.
## Odds vs Probability
### Probability
- The chance of an event happening out of **all** possibilities.
- **Formula**: `Probability = (Events in favour) / (Total observations)`
- Range: 0 to 1.
### Odds
- The ratio of events **happening** to events **not happening**.
- **Formula**: `Odds = (Events in favour) / (Events NOT in favour)`
### Log of Odds (Logit)
- We use the **Log of Odds** because odds can vary widely in magnitude.
- Taking the log keeps the magnitude consistent across classes.
- This is the function used in Logistic Regression.
## The Sigmoid Function
Linear regression fits a straight line, which doesn't work well for classification (where we want output between 0 and 1).
Logistic regression uses an **S-shaped curve** called the **Sigmoid Function**.
- **Formula**: `S(z) = 1 / (1 + e^-z)`
- **Output**: Always between **0 and 1**.
- **Usage**:
- If output > Threshold (e.g., 0.5) -> Classify as **Positive** (e.g., Presence of fish).
- If output < Threshold -> Classify as **Negative** (e.g., Absence of fish).
## Assumptions of Logistic Regression
1. **Independence of Error**: Sample outcomes are separate (no duplicates).
2. **Linearity in the Logit**: Relationship between independent variables and log-odds is linear.
3. **Absence of Multicollinearity**: Independent variables should not be highly correlated with each other.
4. **No Strong Outliers**: Extreme values should not heavily influence the model.