Lesson 12 of 15

Activation Functions

Activation Functions

Activation functions introduce non-linearity into neural networks. Without them, stacking linear layers is equivalent to a single linear layer — the network could not learn complex patterns.

ReLU

Rectified Linear Unit — the most widely used activation in modern deep learning:

ReLU(x)=max(0,x)\text{ReLU}(x) = \max(0, x)

Cheap to compute and avoids the vanishing gradient problem.

Leaky ReLU

Fixes the "dying ReLU" problem by allowing a small gradient for negative inputs:

LeakyReLU(x)={xx>0αxx0\text{LeakyReLU}(x) = \begin{cases} x & x > 0 \\ \alpha x & x \leq 0 \end{cases}

where α\alpha is a small constant (default 0.010.01).

Tanh

The hyperbolic tangent squashes inputs to (1,1)(-1, 1):

tanh(x)=e2x1e2x+1\tanh(x) = \frac{e^{2x} - 1}{e^{2x} + 1}

Softmax

Used in the output layer for multi-class classification. Converts a vector of logits into a probability distribution:

softmax(x)i=exijexj\text{softmax}(x)_i = \frac{e^{x_i}}{\sum_j e^{x_j}}

All outputs are in (0,1)(0,1) and sum to 1.

Your Task

Implement:

  • relu(x)max(0,x)\max(0, x)
  • leaky_relu(x, alpha=0.01)
  • tanh_activation(x) → computed via (e2x1)/(e2x+1)(e^{2x}-1)/(e^{2x}+1)
  • softmax(x) → list of probabilities
Python runtime loading...
Loading...
Click "Run" to execute your code.