# Logistic Regression Logistic Regression is used for **classification** problems (predicting categories), even though it has "Regression" in its name. ## Odds vs Probability ### Probability - The chance of an event happening out of **all** possibilities. - **Formula**: `Probability = (Events in favour) / (Total observations)` - Range: 0 to 1. ### Odds - The ratio of events **happening** to events **not happening**. - **Formula**: `Odds = (Events in favour) / (Events NOT in favour)` ### Log of Odds (Logit) - We use the **Log of Odds** because odds can vary widely in magnitude. - Taking the log keeps the magnitude consistent across classes. - This is the function used in Logistic Regression. ## The Sigmoid Function Linear regression fits a straight line, which doesn't work well for classification (where we want output between 0 and 1). Logistic regression uses an **S-shaped curve** called the **Sigmoid Function**. - **Formula**: `S(z) = 1 / (1 + e^-z)` - **Output**: Always between **0 and 1**. - **Usage**: - If output > Threshold (e.g., 0.5) -> Classify as **Positive** (e.g., Presence of fish). - If output < Threshold -> Classify as **Negative** (e.g., Absence of fish). ## Assumptions of Logistic Regression 1. **Independence of Error**: Sample outcomes are separate (no duplicates). 2. **Linearity in the Logit**: Relationship between independent variables and log-odds is linear. 3. **Absence of Multicollinearity**: Independent variables should not be highly correlated with each other. 4. **No Strong Outliers**: Extreme values should not heavily influence the model.