akshat/DMCT-NOTES

Files

Akshat Mehta 8f8e35ae95 unit 2 added

2025-11-24 15:26:41 +05:30

1.7 KiB

Raw Blame History

Model Evaluation Metrics

How do we know if our classification model is good? We use several metrics.

Confusion Matrix

A table that compares Predicted values with Actual values.

	Predicted Negative	Predicted Positive
Actual Negative	True Negative (TN)	False Positive (FP)
Actual Positive	False Negative (FN)	True Positive (TP)

TP: Correctly predicted positive.
TN: Correctly predicted negative.
FP: Incorrectly predicted positive (Type I Error).
FN: Incorrectly predicted negative (Type II Error).

Key Metrics

1. Accuracy

Fraction of all correct predictions.
Formula: (TP + TN) / Total
Problem: Not reliable if data is imbalanced (Accuracy Paradox).

2. Precision

Out of all predicted positives, how many were actually positive?
Formula: TP / (TP + FP)
Higher is better.

3. Recall (Sensitivity / TPR)

Out of all actual positives, how many did we find?
Formula: TP / (TP + FN)
Higher is better.

4. Specificity

Out of all actual negatives, how many did we correctly identify?
Formula: TN / (TN + FP)

5. F1 Score

The harmonic mean of Precision and Recall.
Good for balancing precision and recall, especially with uneven classes.
Formula: 2 * (Precision * Recall) / (Precision + Recall)

ROC and AUC

ROC Curve (Receiver Operating Characteristic)

A plot of TPR (Recall) vs FPR (False Positive Rate).
Shows how the model performs at different thresholds.

AUC (Area Under the Curve)

Measures the entire area underneath the ROC curve.
Range: 0 to 1.
Interpretation: Higher AUC means the model is better at distinguishing between classes.