Lesson 2 of 15

Gradient Descent

Gradient Descent

Gradient descent is the optimization algorithm that trains linear regression. It iteratively adjusts ww and bb to minimise the MSE loss.

The Gradients

Given predictions y^i=wxi+b\hat{y}_i = wx_i + b and true labels yiy_i, the gradients of MSE with respect to ww and bb are:

MSEw=1ni=1n(y^iyi)xi\frac{\partial \text{MSE}}{\partial w} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i) x_i

MSEb=1ni=1n(y^iyi)\frac{\partial \text{MSE}}{\partial b} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)

Parameter Update

After computing the gradients, each parameter is updated by a small step in the negative gradient direction:

wwαww \leftarrow w - \alpha \cdot \nabla_w

bbαbb \leftarrow b - \alpha \cdot \nabla_b

where α\alpha is the learning rate — a small positive number like 0.010.01.

Training Loop

Repeat for many iterations:

  1. Compute predictions y^\hat{y}
  2. Compute gradients w\nabla_w, b\nabla_b
  3. Update ww and bb

Your Task

Implement:

  • gradient_w(x, y_pred, y_true) — gradient of MSE w.r.t. ww
  • gradient_b(y_pred, y_true) — gradient of MSE w.r.t. bb
  • update_param(param, grad, lr) — one gradient descent step
Python runtime loading...
Loading...
Click "Run" to execute your code.