Lesson 6 of 15

KL Divergence

Kullback-Leibler Divergence

KL divergence (also called relative entropy) measures how different distribution QQ is from a reference distribution PP:

DKL(PQ)=ipilog2piqiD_{KL}(P \| Q) = \sum_i p_i \log_2 \frac{p_i}{q_i}

Terms where pi=0p_i = 0 are skipped (since 0log0=00 \log 0 = 0).

Critical Property: Asymmetry

DKL(PQ)DKL(QP) in generalD_{KL}(P \| Q) \neq D_{KL}(Q \| P) \text{ in general}

KL divergence is not a true distance metric — it violates symmetry and the triangle inequality. This asymmetry matters in machine learning: the direction PQP \| Q penalizes placing low probability mass where PP has high mass.

Non-Negativity (Gibbs' Inequality)

DKL(PQ)0D_{KL}(P \| Q) \geq 0

with equality if and only if P=QP = Q everywhere.

Numerical Stability

When qi=0q_i = 0 but pi>0p_i > 0, DKLD_{KL} is technically infinite. In practice, add a small ε=1015\varepsilon = 10^{-15} to each qiq_i to avoid division by zero:

DKL(PQ)pi>0pilog2piqi+εD_{KL}(P \| Q) \approx \sum_{p_i > 0} p_i \log_2 \frac{p_i}{q_i + \varepsilon}

import math

def kl_divergence(p, q):
    epsilon = 1e-15
    return sum(p[i] * math.log2(p[i] / (q[i] + epsilon))
               for i in range(len(p)) if p[i] > 0)

p = [0.8, 0.2]
q = [0.6, 0.4]
print(round(kl_divergence(p, q), 4))  # 0.132

Your Task

Implement kl_divergence(p, q):

  • Add ε=1015\varepsilon = 10^{-15} to each qiq_i
  • Sum only over terms where pi>0p_i > 0
  • Return result in bits
Python runtime loading...
Loading...
Click "Run" to execute your code.