Lesson 6 of 15
KL Divergence
Kullback-Leibler Divergence
KL divergence (also called relative entropy) measures how different distribution is from a reference distribution :
Terms where are skipped (since ).
Critical Property: Asymmetry
KL divergence is not a true distance metric — it violates symmetry and the triangle inequality. This asymmetry matters in machine learning: the direction penalizes placing low probability mass where has high mass.
Non-Negativity (Gibbs' Inequality)
with equality if and only if everywhere.
Numerical Stability
When but , is technically infinite. In practice, add a small to each to avoid division by zero:
import math
def kl_divergence(p, q):
epsilon = 1e-15
return sum(p[i] * math.log2(p[i] / (q[i] + epsilon))
for i in range(len(p)) if p[i] > 0)
p = [0.8, 0.2]
q = [0.6, 0.4]
print(round(kl_divergence(p, q), 4)) # 0.132
Your Task
Implement kl_divergence(p, q):
- Add to each
- Sum only over terms where
- Return result in bits
Python runtime loading...
Loading...
Click "Run" to execute your code.