Lesson 10 of 15

Shannon Entropy

Shannon Entropy

Information theory, founded by Claude Shannon in 1948, gives us a rigorous mathematical framework for quantifying uncertainty, randomness, and the capacity to communicate.

Shannon Entropy

Given a discrete probability distribution {pi}\{p_i\}, the Shannon entropy is:

H(X)=ipilog2pi(bits)H(X) = -\sum_i p_i \log_2 p_i \quad \text{(bits)}

  • A fair coin (p=0.5,0.5p = 0.5, 0.5) has H=1H = 1 bit
  • A fair die (pi=1/6p_i = 1/6 for i=1..6i = 1..6) has H=log262.585H = \log_2 6 \approx 2.585 bits
  • A certain outcome (p=1p = 1) has H=0H = 0 bits
  • The maximum entropy for NN outcomes is log2N\log_2 N (achieved by the uniform distribution)

By convention, 0log20=00 \log_2 0 = 0 (zero-probability outcomes are skipped).

Joint Entropy and Mutual Information

For a joint distribution p(x,y)p(x, y), the joint entropy is:

H(X,Y)=i,jp(xi,yj)log2p(xi,yj)H(X, Y) = -\sum_{i,j} p(x_i, y_j) \log_2 p(x_i, y_j)

The marginal distributions are obtained by summing over the other variable:

p(xi)=jp(xi,yj),p(yj)=ip(xi,yj)p(x_i) = \sum_j p(x_i, y_j), \qquad p(y_j) = \sum_i p(x_i, y_j)

Mutual information measures how much knowing XX reduces uncertainty about YY:

I(X;Y)=H(X)+H(Y)H(X,Y)I(X; Y) = H(X) + H(Y) - H(X, Y)

If XX and YY are independent, H(X,Y)=H(X)+H(Y)H(X, Y) = H(X) + H(Y) and therefore I(X;Y)=0I(X; Y) = 0.

KL Divergence

The Kullback-Leibler divergence (relative entropy) measures how different distribution QQ is from a reference PP:

DKL(PQ)=ipilog2piqiD_{\mathrm{KL}}(P \| Q) = \sum_i p_i \log_2 \frac{p_i}{q_i}

Properties:

  • DKL(PQ)0D_{\mathrm{KL}}(P \| Q) \geq 0 always (equality iff P=QP = Q)
  • Not symmetric: DKL(PQ)DKL(QP)D_{\mathrm{KL}}(P \| Q) \neq D_{\mathrm{KL}}(Q \| P) in general

Your Task

Implement four functions:

  • shannon_entropy(probs) — compute HH in bits; skip zero entries
  • joint_entropy(joint_probs) — compute H(X,Y)H(X,Y) from a 2D list of joint probabilities
  • mutual_information(joint_probs) — compute I(X;Y)I(X;Y) from a 2D list of joint probabilities
  • kl_divergence(p, q) — compute DKL(PQ)D_{\mathrm{KL}}(P \| Q) in bits
Python runtime loading...
Loading...
Click "Run" to execute your code.