Lesson 5 of 15

Mutual Information

Mutual Information

Mutual information I(X;Y)I(X; Y) measures how much knowing one variable reduces uncertainty about the other.

I(X;Y)=H(X)+H(Y)H(X,Y)I(X; Y) = H(X) + H(Y) - H(X, Y)

Equivalently: I(X;Y)=H(X)H(XY)=H(Y)H(YX)I(X; Y) = H(X) - H(X|Y) = H(Y) - H(Y|X)

Properties

  • Non-negative: I(X;Y)0I(X; Y) \geq 0 always
  • Symmetric: I(X;Y)=I(Y;X)I(X; Y) = I(Y; X)
  • Zero for independence: I(X;Y)=0I(X; Y) = 0 iff XYX \perp Y
  • Maximum: I(X;Y)min(H(X),H(Y))I(X; Y) \leq \min(H(X), H(Y))

Normalized Mutual Information

Mutual information values depend on entropy magnitude, making comparisons across different datasets difficult. Normalized MI scales it to [0,1][0, 1]:

NMI(X,Y)=I(X;Y)H(X)H(Y)\text{NMI}(X, Y) = \frac{I(X; Y)}{\sqrt{H(X) \cdot H(Y)}}

If H(X)=0H(X) = 0 or H(Y)=0H(Y) = 0, return 0.00.0 (no uncertainty to reduce).

Example

For independent uniform variables: H(X)=H(Y)=1H(X) = H(Y) = 1, H(X,Y)=2H(X,Y) = 2, so I=1+12=0I = 1 + 1 - 2 = 0.

For fully correlated variables (Y=XY = X): H(X,Y)=H(X)=H(Y)=1H(X,Y) = H(X) = H(Y) = 1, so I=1I = 1.

import math

def mutual_information(joint):
    mx = [sum(row) for row in joint]
    n_cols = len(joint[0])
    my = [sum(joint[i][j] for i in range(len(joint))) for j in range(n_cols)]
    hx = sum(-p * math.log2(p) for p in mx if p > 0)
    hy = sum(-p * math.log2(p) for p in my if p > 0)
    hxy = sum(-p * math.log2(p) for row in joint for p in row if p > 0)
    return hx + hy - hxy

Your Task

Implement:

  • mutual_information(joint)I(X;Y)=H(X)+H(Y)H(X,Y)I(X;Y) = H(X) + H(Y) - H(X,Y)
  • normalized_mi(joint)I(X;Y)/H(X)H(Y)I(X;Y) / \sqrt{H(X) \cdot H(Y)}; return 0.00.0 if denominator is zero
Python runtime loading...
Loading...
Click "Run" to execute your code.