Lesson 5 of 15
Activation Derivatives
Differentiating Activations
Backpropagation requires the derivative of each activation function. These derivatives appear in every gradient computation.
Sigmoid Derivative
The sigmoid has a beautiful self-referential derivative (recall that here is the sigmoid function, not the statistical standard deviation):
Derivation: Let .
Maximum value is at . This "saturation" near 0 or 1 causes the vanishing gradient problem in deep networks.
ReLU Derivative
ReLU has a simple piecewise derivative:
This is just an indicator function. Units with contribute zero gradient — the "dead ReLU" problem. But for active units, the gradient flows through unchanged.
Your Task
Implement:
sigmoid_grad(x)— derivative of sigmoid atrelu_grad(x)— derivative of ReLU at (return0.0when )
Python runtime loading...
Loading...
Click "Run" to execute your code.