Files
DMCT-NOTES/unit 2/03_Logistic_Regression.md
Akshat Mehta 8f8e35ae95 unit 2 added
2025-11-24 15:26:41 +05:30

1.6 KiB

Logistic Regression

Logistic Regression is used for classification problems (predicting categories), even though it has "Regression" in its name.

Odds vs Probability

Probability

  • The chance of an event happening out of all possibilities.
  • Formula: Probability = (Events in favour) / (Total observations)
  • Range: 0 to 1.

Odds

  • The ratio of events happening to events not happening.
  • Formula: Odds = (Events in favour) / (Events NOT in favour)

Log of Odds (Logit)

  • We use the Log of Odds because odds can vary widely in magnitude.
  • Taking the log keeps the magnitude consistent across classes.
  • This is the function used in Logistic Regression.

The Sigmoid Function

Linear regression fits a straight line, which doesn't work well for classification (where we want output between 0 and 1). Logistic regression uses an S-shaped curve called the Sigmoid Function.

  • Formula: S(z) = 1 / (1 + e^-z)
  • Output: Always between 0 and 1.
  • Usage:
    • If output > Threshold (e.g., 0.5) -> Classify as Positive (e.g., Presence of fish).
    • If output < Threshold -> Classify as Negative (e.g., Absence of fish).

Assumptions of Logistic Regression

  1. Independence of Error: Sample outcomes are separate (no duplicates).
  2. Linearity in the Logit: Relationship between independent variables and log-odds is linear.
  3. Absence of Multicollinearity: Independent variables should not be highly correlated with each other.
  4. No Strong Outliers: Extreme values should not heavily influence the model.