Regularization

Regularization & Evaluation Metrics

Overfitting occurs when a model memorises the training data and fails to generalise. Regularization adds a penalty term to the loss function that discourages large weights.

L1 Regularization (Lasso)

$\mathcal{L}_{\text{L1}} = \lambda \sum_{i} |w_i|$

L1 drives many weights to exactly zero, producing sparse models.

L2 Regularization (Ridge)

$\mathcal{L}_{\text{L2}} = \lambda \sum_{i} w_i^2$

L2 shrinks all weights smoothly towards zero but rarely makes them exactly zero.

Elastic Net

A combination of L1 and L2:

$\mathcal{L}_{\text{EN}} = \lambda_1 \sum_i |w_i| + \lambda_2 \sum_i w_i^2$

Ridge Gradient

When L2 is added to the MSE loss, the gradient of $w_i$ gains an extra term:

$\nabla_{w_i} (\text{MSE} + \lambda \|\mathbf{w}\|_2^2) = \nabla_{w_i} \text{MSE} + 2\lambda w_i$

This is why ridge regression is equivalent to weight decay: each weight is slightly shrunk at every update.

Evaluation Metrics for Classification

Accuracy alone is misleading on imbalanced datasets (e.g., 99% negative class → a model that always predicts "negative" gets 99% accuracy). Instead, use the confusion matrix counts:

	Predicted Positive	Predicted Negative
Actually Positive	TP (True Positive)	FN (False Negative)
Actually Negative	FP (False Positive)	TN (True Negative)

From these we derive:

Precision: Of all predicted positives, how many are correct? $P = \frac{\text{TP}}{\text{TP} + \text{FP}}$
Recall (sensitivity): Of all actual positives, how many did we find? $R = \frac{\text{TP}}{\text{TP} + \text{FN}}$
F1 Score: The harmonic mean of precision and recall: $F_1 = \frac{2PR}{P + R}$

F1 balances the trade-off: high precision with low recall (conservative model) vs. high recall with low precision (aggressive model). Return 0.0 when both precision and recall are zero.

Your Task

Implement:

l1_penalty(w, lambda_) → $\lambda \sum |w_i|$
l2_penalty(w, lambda_) → $\lambda \sum w_i^2$
elastic_net(w, lambda1, lambda2) → L1 + L2
ridge_gradient(w, grad_w, lambda_) → element-wise $\nabla + 2\lambda w$
precision_recall_f1(y_true, y_pred) → tuple of (precision, recall, f1)

← Previous Next →

Python runtime loading...

Click "Run" to execute your code.