Lesson 18 of 18

Gradient Descent

Gradient Descent

Gradient descent is the most widely used optimization algorithm in machine learning. It minimizes a function by repeatedly stepping in the direction of the negative gradient (steepest descent):

xn+1=xnαf(xn)x_{n+1} = x_n - \alpha \cdot f'(x_n)

where α\alpha is the learning rate (step size).

Intuition

Imagine standing on a hilly landscape, trying to reach the lowest point. At each step, you look at the slope (gradient) under your feet and take a step downhill. With a small enough step size, you converge to a local minimum.

Convergence

For a convex function f(x)=(xx)2f(x) = (x - x^*)^2, the update rule gives:

xn+1=xnα2(xnx)=(12α)(xnx)+xx_{n+1} = x_n - \alpha \cdot 2(x_n - x^*) = (1 - 2\alpha)(x_n - x^*)+ x^*

The error shrinks by factor 12α|1 - 2\alpha| at each step. For α=0.1\alpha = 0.1, the factor is 0.80.8 — the error halves approximately every 3 steps.

Example

Minimize f(x)=(x3)2f(x) = (x - 3)^2, so f(x)=2(x3)f'(x) = 2(x-3):

double df(double x) { return 2.0 * (x - 3.0); }

double x = 0.0;
for (int i = 0; i < 100; i++) {
    x = x - 0.1 * df(x);
}
printf("%.4f\n", x);  // 3.0000

Choosing the Learning Rate

  • Too large: diverges (overshoots the minimum)
  • Too small: converges very slowly
  • Just right: converges reliably

Your Task

Implement double grad_descent(df, start, lr, steps) that performs gradient descent on the function with derivative df.

TCC compiler loading...
Loading...
Click "Run" to execute your code.