unit 2 added
This commit is contained in:
35
unit 2/03_Logistic_Regression.md
Normal file
35
unit 2/03_Logistic_Regression.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# Logistic Regression
|
||||
|
||||
Logistic Regression is used for **classification** problems (predicting categories), even though it has "Regression" in its name.
|
||||
|
||||
## Odds vs Probability
|
||||
|
||||
### Probability
|
||||
- The chance of an event happening out of **all** possibilities.
|
||||
- **Formula**: `Probability = (Events in favour) / (Total observations)`
|
||||
- Range: 0 to 1.
|
||||
|
||||
### Odds
|
||||
- The ratio of events **happening** to events **not happening**.
|
||||
- **Formula**: `Odds = (Events in favour) / (Events NOT in favour)`
|
||||
|
||||
### Log of Odds (Logit)
|
||||
- We use the **Log of Odds** because odds can vary widely in magnitude.
|
||||
- Taking the log keeps the magnitude consistent across classes.
|
||||
- This is the function used in Logistic Regression.
|
||||
|
||||
## The Sigmoid Function
|
||||
Linear regression fits a straight line, which doesn't work well for classification (where we want output between 0 and 1).
|
||||
Logistic regression uses an **S-shaped curve** called the **Sigmoid Function**.
|
||||
|
||||
- **Formula**: `S(z) = 1 / (1 + e^-z)`
|
||||
- **Output**: Always between **0 and 1**.
|
||||
- **Usage**:
|
||||
- If output > Threshold (e.g., 0.5) -> Classify as **Positive** (e.g., Presence of fish).
|
||||
- If output < Threshold -> Classify as **Negative** (e.g., Absence of fish).
|
||||
|
||||
## Assumptions of Logistic Regression
|
||||
1. **Independence of Error**: Sample outcomes are separate (no duplicates).
|
||||
2. **Linearity in the Logit**: Relationship between independent variables and log-odds is linear.
|
||||
3. **Absence of Multicollinearity**: Independent variables should not be highly correlated with each other.
|
||||
4. **No Strong Outliers**: Extreme values should not heavily influence the model.
|
||||
Reference in New Issue
Block a user