Gradient Descent Step

Updating the Weights

Once we have gradients $\frac{\partial \mathcal{L}}{\partial W}$ and $\frac{\partial \mathcal{L}}{\partial \mathbf{b}}$ , we update the parameters by taking a step opposite to the gradient:

$W \leftarrow W - \eta \cdot \frac{\partial \mathcal{L}}{\partial W}$

$\mathbf{b} \leftarrow \mathbf{b} - \eta \cdot \frac{\partial \mathcal{L}}{\partial \mathbf{b}}$

Where $\eta$ (eta) is the learning rate — how large a step to take.

Choosing the Learning Rate

Too large: loss oscillates or diverges (overshoot the minimum)
Too small: training is unnecessarily slow
Just right: loss decreases smoothly

Typical starting values: $0.1$ for small networks, $10^{-3}$ for deep networks with Adam.

Gradient Descent Update

For a layer with weight matrix $W$ (shape $m \times n$ ) and bias vector $\mathbf{b}$ (length $m$ ):

$W_{jk} \leftarrow W_{jk} - \eta \cdot \frac{\partial \mathcal{L}}{\partial W_{jk}}$

$b_j \leftarrow b_j - \eta \cdot \frac{\partial \mathcal{L}}{\partial b_j}$

Your Task

Implement gradient_step(weights, biases, dw, db, lr) that returns updated (new_weights, new_biases).

← Previous Next →

Python runtime loading...

Click "Run" to execute your code.