Activation Functions

Why Activation Functions?

Without an activation function, stacking multiple layers collapses to a single linear transformation:

$W_2(W_1 x + b_1) + b_2 = (W_2 W_1)x + (W_2 b_1 + b_2)$

This is just another linear function. A deep network of linear layers is no more powerful than a single layer. Non-linearity is what lets neural networks approximate any function.

The Three Classics

Sigmoid — maps any input to $(0, 1)$ , useful for probabilities:

$\sigma(x) = \frac{1}{1 + e^{-x}}$

Notation note: Here $\sigma$ denotes the sigmoid activation function. In statistics and finance, $\sigma$ instead represents standard deviation or volatility.

ReLU (Rectified Linear Unit) — the most widely used, fast and sparse:

$\text{ReLU}(x) = \max(0, x)$

Tanh — maps to $(-1, 1)$ , zero-centered (often better than sigmoid for hidden layers):

$\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$

Modern Activations

GELU (Gaussian Error Linear Unit) — used in GPT, BERT, and most modern transformers:

$\text{GELU}(x) = x \cdot \Phi(x) \approx 0.5 x \left(1 + \tanh\left(\sqrt{\frac{2}{\pi}}(x + 0.044715 x^3)\right)\right)$

Unlike ReLU's hard cutoff at 0, GELU provides a smooth, probabilistic gate — small negative inputs are attenuated rather than zeroed.

SiLU (Sigmoid Linear Unit, a.k.a. Swish) — used in EfficientNet, LLaMA, and many vision models:

$\text{SiLU}(x) = x \cdot \sigma(x) = \frac{x}{1 + e^{-x}}$

SiLU is smooth and non-monotonic: it dips slightly below zero near $x \approx -1.28$ , which can help optimization.

Choosing an Activation

Hidden layers: ReLU (and variants) dominates modern networks — fast, avoids vanishing gradients
Transformer hidden layers: GELU is the standard choice (GPT, BERT)
Output layer for binary classification: Sigmoid (output is a probability)
Output layer for regression: No activation (linear output)
Output layer for multi-class: Softmax (next courses)

Your Task

Implement sigmoid(x), relu(x), tanh_act(x), gelu(x), and silu(x).

← Previous Next →

Python runtime loading...

Click "Run" to execute your code.