Lesson 6 of 15

Layer Backpropagation

Gradients Through One Layer

Consider a single neuron with sigmoid activation (σ\sigma = sigmoid function, not standard deviation) and MSE loss:

z=wx+ba=σ(z)L=(ay)2z = \mathbf{w} \cdot \mathbf{x} + b \qquad a = \sigma(z) \qquad \mathcal{L} = (a - y)^2

To update w\mathbf{w} and bb, we need Lw\frac{\partial \mathcal{L}}{\partial \mathbf{w}} and Lb\frac{\partial \mathcal{L}}{\partial b}.

Applying the Chain Rule

Lwi=Laazzwi\frac{\partial \mathcal{L}}{\partial w_i} = \frac{\partial \mathcal{L}}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial w_i}

Each factor:

La=2(ay)\frac{\partial \mathcal{L}}{\partial a} = 2(a - y)

az=σ(z)=a(1a)\frac{\partial a}{\partial z} = \sigma'(z) = a(1-a)

zwi=xizb=1\frac{\partial z}{\partial w_i} = x_i \qquad \frac{\partial z}{\partial b} = 1

Define the error signal δ=Laaz=2(ay)a(1a)\delta = \frac{\partial \mathcal{L}}{\partial a} \cdot \frac{\partial a}{\partial z} = 2(a-y) \cdot a(1-a). Then:

Lwi=δxiLb=δ\frac{\partial \mathcal{L}}{\partial w_i} = \delta \cdot x_i \qquad \frac{\partial \mathcal{L}}{\partial b} = \delta

Your Task

Implement layer_backward(inputs, weights, bias, target) that:

  1. Performs the forward pass to compute aa
  2. Computes δ=2(ay)a(1a)\delta = 2(a - y) \cdot a(1-a)
  3. Returns (dw, db) where dw[i] = delta * inputs[i]
Python runtime loading...
Loading...
Click "Run" to execute your code.