Regularization
Regularization & Evaluation Metrics
Overfitting occurs when a model memorises the training data and fails to generalise. Regularization adds a penalty term to the loss function that discourages large weights.
L1 Regularization (Lasso)
L1 drives many weights to exactly zero, producing sparse models.
L2 Regularization (Ridge)
L2 shrinks all weights smoothly towards zero but rarely makes them exactly zero.
Elastic Net
A combination of L1 and L2:
Ridge Gradient
When L2 is added to the MSE loss, the gradient of gains an extra term:
This is why ridge regression is equivalent to weight decay: each weight is slightly shrunk at every update.
Evaluation Metrics for Classification
Accuracy alone is misleading on imbalanced datasets (e.g., 99% negative class → a model that always predicts "negative" gets 99% accuracy). Instead, use the confusion matrix counts:
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actually Positive | TP (True Positive) | FN (False Negative) |
| Actually Negative | FP (False Positive) | TN (True Negative) |
From these we derive:
- Precision: Of all predicted positives, how many are correct?
- Recall (sensitivity): Of all actual positives, how many did we find?
- F1 Score: The harmonic mean of precision and recall:
F1 balances the trade-off: high precision with low recall (conservative model) vs. high recall with low precision (aggressive model). Return 0.0 when both precision and recall are zero.
Your Task
Implement:
l1_penalty(w, lambda_)→l2_penalty(w, lambda_)→elastic_net(w, lambda1, lambda2)→ L1 + L2ridge_gradient(w, grad_w, lambda_)→ element-wiseprecision_recall_f1(y_true, y_pred)→ tuple of (precision, recall, f1)