Logistic Regression

Logistic Regression is used for classification problems (predicting categories), even though it has "Regression" in its name.

Odds vs Probability

Probability

The chance of an event happening out of all possibilities.
Formula: Probability = (Events in favour) / (Total observations)
Range: 0 to 1.

Odds

The ratio of events happening to events not happening.
Formula: Odds = (Events in favour) / (Events NOT in favour)

Log of Odds (Logit)

We use the Log of Odds because odds can vary widely in magnitude.
Taking the log keeps the magnitude consistent across classes.
This is the function used in Logistic Regression.

The Sigmoid Function

Linear regression fits a straight line, which doesn't work well for classification (where we want output between 0 and 1). Logistic regression uses an S-shaped curve called the Sigmoid Function.

Formula: S(z) = 1 / (1 + e^-z)
Output: Always between 0 and 1.
Usage:
- If output > Threshold (e.g., 0.5) -> Classify as Positive (e.g., Presence of fish).
- If output < Threshold -> Classify as Negative (e.g., Absence of fish).

Assumptions of Logistic Regression

Independence of Error: Sample outcomes are separate (no duplicates).
Linearity in the Logit: Relationship between independent variables and log-odds is linear.
Absence of Multicollinearity: Independent variables should not be highly correlated with each other.
No Strong Outliers: Extreme values should not heavily influence the model.

1.6 KiB Raw Permalink Blame History