Lesson 10 of 15
Gradient Descent Step
Updating the Weights
Once we have gradients and , we update the parameters by taking a step opposite to the gradient:
Where (eta) is the learning rate — how large a step to take.
Choosing the Learning Rate
- Too large: loss oscillates or diverges (overshoot the minimum)
- Too small: training is unnecessarily slow
- Just right: loss decreases smoothly
Typical starting values: for small networks, for deep networks with Adam.
Gradient Descent Update
For a layer with weight matrix (shape ) and bias vector (length ):
Your Task
Implement gradient_step(weights, biases, dw, db, lr) that returns updated (new_weights, new_biases).
Python runtime loading...
Loading...
Click "Run" to execute your code.