Shannon Entropy

Information theory, founded by Claude Shannon in 1948, gives us a rigorous mathematical framework for quantifying uncertainty, randomness, and the capacity to communicate.

Shannon Entropy

Given a discrete probability distribution $\{p_i\}$ , the Shannon entropy is:

$H(X) = -\sum_i p_i \log_2 p_i \quad \text{(bits)}$

A fair coin ( $p = 0.5, 0.5$ ) has $H = 1$ bit
A fair die ( $p_i = 1/6$ for $i = 1..6$ ) has $H = \log_2 6 \approx 2.585$ bits
A certain outcome ( $p = 1$ ) has $H = 0$ bits
The maximum entropy for $N$ outcomes is $\log_2 N$ (achieved by the uniform distribution)

By convention, $0 \log_2 0 = 0$ (zero-probability outcomes are skipped).

Joint Entropy and Mutual Information

For a joint distribution $p(x, y)$ , the joint entropy is:

$H(X, Y) = -\sum_{i,j} p(x_i, y_j) \log_2 p(x_i, y_j)$

The marginal distributions are obtained by summing over the other variable:

$p(x_i) = \sum_j p(x_i, y_j), \qquad p(y_j) = \sum_i p(x_i, y_j)$

Mutual information measures how much knowing $X$ reduces uncertainty about $Y$ :

$I(X; Y) = H(X) + H(Y) - H(X, Y)$

If $X$ and $Y$ are independent, $H(X, Y) = H(X) + H(Y)$ and therefore $I(X; Y) = 0$ .

KL Divergence

The Kullback-Leibler divergence (relative entropy) measures how different distribution $Q$ is from a reference $P$ :

$D_{\mathrm{KL}}(P \| Q) = \sum_i p_i \log_2 \frac{p_i}{q_i}$

Properties:

$D_{\mathrm{KL}}(P \| Q) \geq 0$ always (equality iff $P = Q$ )
Not symmetric: $D_{\mathrm{KL}}(P \| Q) \neq D_{\mathrm{KL}}(Q \| P)$ in general

Your Task

Implement four functions:

shannon_entropy(probs) — compute $H$ in bits; skip zero entries
joint_entropy(joint_probs) — compute $H(X,Y)$ from a 2D list of joint probabilities
mutual_information(joint_probs) — compute $I(X;Y)$ from a 2D list of joint probabilities
kl_divergence(p, q) — compute $D_{\mathrm{KL}}(P \| Q)$ in bits

← Previous Next →

Python runtime loading...

Click "Run" to execute your code.