Gradient Descent

Gradient descent is the optimization algorithm that trains linear regression. It iteratively adjusts $w$ and $b$ to minimise the MSE loss.

Given predictions $\hat{y}_i = wx_i + b$ and true labels $y_i$ , the gradients of MSE with respect to $w$ and $b$ are:

$\frac{\partial \text{MSE}}{\partial w} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i) x_i$

$\frac{\partial \text{MSE}}{\partial b} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)$

After computing the gradients, each parameter is updated by a small step in the negative gradient direction:

$w \leftarrow w - \alpha \cdot \nabla_w$

$b \leftarrow b - \alpha \cdot \nabla_b$

where $\alpha$ is the learning rate — a small positive number like $0.01$ .

Repeat for many iterations:

Implement:

Python runtime loading...

Click "Run" to execute your code.