Conditional Entropy

Conditional entropy $H(Y | X)$ measures the average uncertainty remaining in $Y$ after you observe $X$ .

Chain Rule

The most useful formula uses the chain rule of entropy:

$H(Y | X) = H(X, Y) - H(X)$ $H(X | Y) = H(X, Y) - H(Y)$

These hold because:

Knowing $X$ perfectly predicts $Y$ ⟹ $H(Y|X) = 0$
Independent $X$ and $Y$ ⟹ $H(Y|X) = H(Y)$ (knowing $X$ tells you nothing about $Y$ )

Key Inequalities

$0 \leq H(Y|X) \leq H(Y)$

Lower bound 0: $Y$ is determined by $X$ (deterministic channel)
Upper bound $H(Y)$ : $X$ and $Y$ are independent

Example

If $X$ and $Y$ are independent and uniform over 2 values each: $H(Y|X) = H(X,Y) - H(X) = 2 - 1 = 1 \text{ bit}$

If $Y = X$ deterministically: $H(Y|X) = H(X,Y) - H(X) = 1 - 1 = 0 \text{ bits}$

import math

def shannon_entropy(probs):
    return sum(-p * math.log2(p) for p in probs if p > 0)

def joint_entropy(joint_probs):
    result = 0.0
    for row in joint_probs:
        for p in row:
            if p > 0:
                result += -p * math.log2(p)
    return result

def conditional_entropy_yx(joint):
    # H(Y|X) = H(X,Y) - H(X)
    hxy = joint_entropy(joint)
    hx = shannon_entropy([sum(row) for row in joint])
    return hxy - hx

Your Task

Implement:

conditional_entropy_yx(joint) — $H(Y|X) = H(X,Y) - H(X)$
conditional_entropy_xy(joint) — $H(X|Y) = H(X,Y) - H(Y)$

Both take a 2D list of joint probabilities.

← Previous Next →

Python runtime loading...

Click "Run" to execute your code.