KL Divergence

Kullback-Leibler Divergence

KL divergence (also called relative entropy) measures how different distribution $Q$ is from a reference distribution $P$ :

$D_{KL}(P \| Q) = \sum_i p_i \log_2 \frac{p_i}{q_i}$

Terms where $p_i = 0$ are skipped (since $0 \log 0 = 0$ ).

Critical Property: Asymmetry

$D_{KL}(P \| Q) \neq D_{KL}(Q \| P) \text{ in general}$

KL divergence is not a true distance metric — it violates symmetry and the triangle inequality. This asymmetry matters in machine learning: the direction $P \| Q$ penalizes placing low probability mass where $P$ has high mass.

Non-Negativity (Gibbs' Inequality)

$D_{KL}(P \| Q) \geq 0$

with equality if and only if $P = Q$ everywhere.

Numerical Stability

When $q_i = 0$ but $p_i > 0$ , $D_{KL}$ is technically infinite. In practice, add a small $\varepsilon = 10^{-15}$ to each $q_i$ to avoid division by zero:

$D_{KL}(P \| Q) \approx \sum_{p_i > 0} p_i \log_2 \frac{p_i}{q_i + \varepsilon}$

import math

def kl_divergence(p, q):
    epsilon = 1e-15
    return sum(p[i] * math.log2(p[i] / (q[i] + epsilon))
               for i in range(len(p)) if p[i] > 0)

p = [0.8, 0.2]
q = [0.6, 0.4]
print(round(kl_divergence(p, q), 4))  # 0.132

Your Task

Implement kl_divergence(p, q):

Add $\varepsilon = 10^{-15}$ to each $q_i$
Sum only over terms where $p_i > 0$
Return result in bits

← Previous Next →

Python runtime loading...

Click "Run" to execute your code.