36 lines
1.6 KiB
Markdown
36 lines
1.6 KiB
Markdown
# Logistic Regression
|
|
|
|
Logistic Regression is used for **classification** problems (predicting categories), even though it has "Regression" in its name.
|
|
|
|
## Odds vs Probability
|
|
|
|
### Probability
|
|
- The chance of an event happening out of **all** possibilities.
|
|
- **Formula**: `Probability = (Events in favour) / (Total observations)`
|
|
- Range: 0 to 1.
|
|
|
|
### Odds
|
|
- The ratio of events **happening** to events **not happening**.
|
|
- **Formula**: `Odds = (Events in favour) / (Events NOT in favour)`
|
|
|
|
### Log of Odds (Logit)
|
|
- We use the **Log of Odds** because odds can vary widely in magnitude.
|
|
- Taking the log keeps the magnitude consistent across classes.
|
|
- This is the function used in Logistic Regression.
|
|
|
|
## The Sigmoid Function
|
|
Linear regression fits a straight line, which doesn't work well for classification (where we want output between 0 and 1).
|
|
Logistic regression uses an **S-shaped curve** called the **Sigmoid Function**.
|
|
|
|
- **Formula**: `S(z) = 1 / (1 + e^-z)`
|
|
- **Output**: Always between **0 and 1**.
|
|
- **Usage**:
|
|
- If output > Threshold (e.g., 0.5) -> Classify as **Positive** (e.g., Presence of fish).
|
|
- If output < Threshold -> Classify as **Negative** (e.g., Absence of fish).
|
|
|
|
## Assumptions of Logistic Regression
|
|
1. **Independence of Error**: Sample outcomes are separate (no duplicates).
|
|
2. **Linearity in the Logit**: Relationship between independent variables and log-odds is linear.
|
|
3. **Absence of Multicollinearity**: Independent variables should not be highly correlated with each other.
|
|
4. **No Strong Outliers**: Extreme values should not heavily influence the model.
|