Lesson 4 of 15

Conditional Entropy

Conditional Entropy

Conditional entropy H(YX)H(Y | X) measures the average uncertainty remaining in YY after you observe XX.

Chain Rule

The most useful formula uses the chain rule of entropy:

H(YX)=H(X,Y)H(X)H(Y | X) = H(X, Y) - H(X) H(XY)=H(X,Y)H(Y)H(X | Y) = H(X, Y) - H(Y)

These hold because:

  • Knowing XX perfectly predicts YYH(YX)=0H(Y|X) = 0
  • Independent XX and YYH(YX)=H(Y)H(Y|X) = H(Y) (knowing XX tells you nothing about YY)

Key Inequalities

0H(YX)H(Y)0 \leq H(Y|X) \leq H(Y)

  • Lower bound 0: YY is determined by XX (deterministic channel)
  • Upper bound H(Y)H(Y): XX and YY are independent

Example

If XX and YY are independent and uniform over 2 values each: H(YX)=H(X,Y)H(X)=21=1 bitH(Y|X) = H(X,Y) - H(X) = 2 - 1 = 1 \text{ bit}

If Y=XY = X deterministically: H(YX)=H(X,Y)H(X)=11=0 bitsH(Y|X) = H(X,Y) - H(X) = 1 - 1 = 0 \text{ bits}

import math

def shannon_entropy(probs):
    return sum(-p * math.log2(p) for p in probs if p > 0)

def joint_entropy(joint_probs):
    result = 0.0
    for row in joint_probs:
        for p in row:
            if p > 0:
                result += -p * math.log2(p)
    return result

def conditional_entropy_yx(joint):
    # H(Y|X) = H(X,Y) - H(X)
    hxy = joint_entropy(joint)
    hx = shannon_entropy([sum(row) for row in joint])
    return hxy - hx

Your Task

Implement:

  • conditional_entropy_yx(joint)H(YX)=H(X,Y)H(X)H(Y|X) = H(X,Y) - H(X)
  • conditional_entropy_xy(joint)H(XY)=H(X,Y)H(Y)H(X|Y) = H(X,Y) - H(Y)

Both take a 2D list of joint probabilities.

Python runtime loading...
Loading...
Click "Run" to execute your code.