Lesson 10 of 15

Gradient Descent Step

Updating the Weights

Once we have gradients LW\frac{\partial \mathcal{L}}{\partial W} and Lb\frac{\partial \mathcal{L}}{\partial \mathbf{b}}, we update the parameters by taking a step opposite to the gradient:

WWηLWW \leftarrow W - \eta \cdot \frac{\partial \mathcal{L}}{\partial W}

bbηLb\mathbf{b} \leftarrow \mathbf{b} - \eta \cdot \frac{\partial \mathcal{L}}{\partial \mathbf{b}}

Where η\eta (eta) is the learning rate — how large a step to take.

Choosing the Learning Rate

  • Too large: loss oscillates or diverges (overshoot the minimum)
  • Too small: training is unnecessarily slow
  • Just right: loss decreases smoothly

Typical starting values: 0.10.1 for small networks, 10310^{-3} for deep networks with Adam.

Gradient Descent Update

For a layer with weight matrix WW (shape m×nm \times n) and bias vector b\mathbf{b} (length mm):

WjkWjkηLWjkW_{jk} \leftarrow W_{jk} - \eta \cdot \frac{\partial \mathcal{L}}{\partial W_{jk}}

bjbjηLbjb_j \leftarrow b_j - \eta \cdot \frac{\partial \mathcal{L}}{\partial b_j}

Your Task

Implement gradient_step(weights, biases, dw, db, lr) that returns updated (new_weights, new_biases).

Python runtime loading...
Loading...
Click "Run" to execute your code.