Lesson 9 of 15

Total Variation and Hellinger Distance

Total Variation Distance

The total variation (TV) distance is the maximum possible difference in probability assigned to any event:

TV(P,Q)=12ipiqi\text{TV}(P, Q) = \frac{1}{2} \sum_i |p_i - q_i|

The factor of 1/21/2 ensures TV [0,1]\in [0, 1].

Properties

  • Metric: TV satisfies symmetry, non-negativity, and triangle inequality
  • Bounded: 0TV(P,Q)10 \leq \text{TV}(P, Q) \leq 1
  • TV = 0 iff P=QP = Q; TV = 1 iff PP and QQ have disjoint supports

Hellinger Distance

The Hellinger distance is another probability metric based on the L2L^2 norm of square-root probabilities:

H(P,Q)=12i(piqi)2H(P, Q) = \sqrt{\frac{1}{2} \sum_i \left(\sqrt{p_i} - \sqrt{q_i}\right)^2}

It is also bounded in [0,1][0, 1] and has useful relationships to both TV distance and KL divergence.

Comparison of Distances

DistanceFormulaRange
Total Variation$\frac{1}{2}\sump_i - q_i
Hellinger12(piqi)2\sqrt{\frac{1}{2}\sum(\sqrt{p_i}-\sqrt{q_i})^2}[0,1][0, 1]
JS DistanceJSD(P,Q)\sqrt{\text{JSD}(P,Q)}[0,1][0, 1]
import math

def total_variation(p, q):
    return 0.5 * sum(abs(p[i] - q[i]) for i in range(len(p)))

def hellinger_distance(p, q):
    return math.sqrt(0.5 * sum((math.sqrt(p[i]) - math.sqrt(q[i]))**2
                                for i in range(len(p))))

p = [0.7, 0.3]
q = [0.4, 0.6]
print(round(total_variation(p, q), 4))    # 0.3
print(round(hellinger_distance(p, q), 4)) # 0.2158

Your Task

Implement:

  • total_variation(p, q)12ipiqi\frac{1}{2} \sum_i |p_i - q_i|
  • hellinger_distance(p, q)12i(piqi)2\sqrt{\frac{1}{2} \sum_i (\sqrt{p_i} - \sqrt{q_i})^2}
Python runtime loading...
Loading...
Click "Run" to execute your code.