Mutual Information

Mutual information $I(X; Y)$ measures how much knowing one variable reduces uncertainty about the other.

$I(X; Y) = H(X) + H(Y) - H(X, Y)$

Equivalently: $I(X; Y) = H(X) - H(X|Y) = H(Y) - H(Y|X)$

Properties

Non-negative: $I(X; Y) \geq 0$ always
Symmetric: $I(X; Y) = I(Y; X)$
Zero for independence: $I(X; Y) = 0$ iff $X \perp Y$
Maximum: $I(X; Y) \leq \min(H(X), H(Y))$

Normalized Mutual Information

Mutual information values depend on entropy magnitude, making comparisons across different datasets difficult. Normalized MI scales it to $[0, 1]$ :

$\text{NMI}(X, Y) = \frac{I(X; Y)}{\sqrt{H(X) \cdot H(Y)}}$

If $H(X) = 0$ or $H(Y) = 0$ , return $0.0$ (no uncertainty to reduce).

Example

For independent uniform variables: $H(X) = H(Y) = 1$ , $H(X,Y) = 2$ , so $I = 1 + 1 - 2 = 0$ .

For fully correlated variables ( $Y = X$ ): $H(X,Y) = H(X) = H(Y) = 1$ , so $I = 1$ .

import math

def mutual_information(joint):
    mx = [sum(row) for row in joint]
    n_cols = len(joint[0])
    my = [sum(joint[i][j] for i in range(len(joint))) for j in range(n_cols)]
    hx = sum(-p * math.log2(p) for p in mx if p > 0)
    hy = sum(-p * math.log2(p) for p in my if p > 0)
    hxy = sum(-p * math.log2(p) for row in joint for p in row if p > 0)
    return hx + hy - hxy

Your Task

Implement:

mutual_information(joint) — $I(X;Y) = H(X) + H(Y) - H(X,Y)$
normalized_mi(joint) — $I(X;Y) / \sqrt{H(X) \cdot H(Y)}$ ; return $0.0$ if denominator is zero

← Previous Next →

Python runtime loading...

Click "Run" to execute your code.