1.6 KiB
1.6 KiB
Logistic Regression
Logistic Regression is used for classification problems (predicting categories), even though it has "Regression" in its name.
Odds vs Probability
Probability
- The chance of an event happening out of all possibilities.
- Formula:
Probability = (Events in favour) / (Total observations) - Range: 0 to 1.
Odds
- The ratio of events happening to events not happening.
- Formula:
Odds = (Events in favour) / (Events NOT in favour)
Log of Odds (Logit)
- We use the Log of Odds because odds can vary widely in magnitude.
- Taking the log keeps the magnitude consistent across classes.
- This is the function used in Logistic Regression.
The Sigmoid Function
Linear regression fits a straight line, which doesn't work well for classification (where we want output between 0 and 1). Logistic regression uses an S-shaped curve called the Sigmoid Function.
- Formula:
S(z) = 1 / (1 + e^-z) - Output: Always between 0 and 1.
- Usage:
- If output > Threshold (e.g., 0.5) -> Classify as Positive (e.g., Presence of fish).
- If output < Threshold -> Classify as Negative (e.g., Absence of fish).
Assumptions of Logistic Regression
- Independence of Error: Sample outcomes are separate (no duplicates).
- Linearity in the Logit: Relationship between independent variables and log-odds is linear.
- Absence of Multicollinearity: Independent variables should not be highly correlated with each other.
- No Strong Outliers: Extreme values should not heavily influence the model.