Lesson 4 of 15

Numerical Gradients

What Is a Gradient?

The gradient of a function ff at point xx tells us: if I increase xx by a tiny amount, how much does f(x)f(x) change?

dfdx=limh0f(x+h)f(x)h\frac{df}{dx} = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}

In neural networks, we need gradients of the loss with respect to every weight. These gradients point in the direction of steepest ascent. We move in the opposite direction to minimize loss.

The Central Difference Approximation

Instead of limits, we can estimate derivatives numerically using a small step hh:

dfdxf(x+h)f(xh)2h\frac{df}{dx} \approx \frac{f(x + h) - f(x - h)}{2h}

This central difference formula is more accurate than the one-sided version because its error is O(h2)O(h^2) rather than O(h)O(h).

Using h=105h = 10^{-5} gives about 10 digits of precision for smooth functions.

Why Numerical Gradients Matter

Numerical gradients are used to verify analytical backpropagation implementations. This is called a gradient check. Before trusting your backprop code, perturb each parameter and compare the numerical and analytical gradients — they should match to within 10410^{-4}.

Your Task

Implement numerical_gradient(f, x, h=1e-5) using the central difference formula.

Python runtime loading...
Loading...
Click "Run" to execute your code.