Lesson 5 of 17
Backpropagation
The Backward Pass
Backpropagation applies the chain rule in reverse topological order. For each node, we multiply the gradient flowing in (from the output side) by each local gradient and accumulate it into the child's gradient.
The Chain Rule
If z = f(a, b), and we already know dL/dz (gradient from the loss to z), then:
dL/da = (dz/da) * (dL/dz) = local_grad_a * z.grad
dL/db = (dz/db) * (dL/dz) = local_grad_b * z.grad
The backward() Method
- Build the topological order
- Set the root gradient to 1 (the loss w.r.t. itself is 1)
- Walk in reverse order: for each node, propagate its gradient to its children
def backward(self):
topo = []
visited = set()
def build_topo(v):
if v not in visited:
visited.add(v)
for child in v._children:
build_topo(child)
topo.append(v)
build_topo(self)
self.grad = 1
for v in reversed(topo):
for child, local_grad in zip(v._children, v._local_grads):
child.grad += local_grad * v.grad
Note +=: gradients accumulate because the same Value may appear multiple times in the graph (e.g., a * a).
Verifying by Hand
For z = a * b:
z._local_grads = (b.data, a.data)- After
z.backward():a.grad = b.data * 1,b.grad = a.data * 1
For w = a + b:
w._local_grads = (1, 1)- After
w.backward():a.grad = 1,b.grad = 1
Your Task
Add backward() to the Value class.
Python runtime loading...
Loading...
Click "Run" to execute your code.