Cross-Entropy

Cross-entropy $H(P, Q)$ measures the average number of bits needed to encode events from distribution $P$ using a code optimized for distribution $Q$ :

$H(P, Q) = -\sum_i p_i \log_2 q_i$

Relationship to KL Divergence

$H(P, Q) = H(P) + D_{KL}(P \| Q)$

This decomposition shows:

$H(P)$ — the irreducible entropy of $P$ (minimum bits needed)
$D_{KL}(P \| Q)$ — extra bits wasted by using the wrong model $Q$

Since $D_{KL} \geq 0$ , we have $H(P, Q) \geq H(P)$ always.

In Machine Learning

Cross-entropy is the standard loss function for classification. With true labels $P$ (one-hot vectors) and model predictions $Q$ (softmax outputs):

$\mathcal{L} = -\sum_i y_i \log_2 \hat{y}_i$

When $P$ is one-hot (true class probability = 1, all others = 0), this simplifies to just $-\log_2(\hat{y}_{\text{true}})$ .

import math

def cross_entropy(p, q):
    epsilon = 1e-15
    return -sum(p[i] * math.log2(q[i] + epsilon)
                for i in range(len(p)) if p[i] > 0)

p = [0.8, 0.2]
q = [0.6, 0.4]
print(round(cross_entropy(p, q), 4))  # 0.854

Your Task

Implement:

cross_entropy(p, q) — $H(P,Q) = -\sum p_i \log_2(q_i + \varepsilon)$ for $p_i > 0$
cross_entropy_loss(y_true, y_pred) — same function, ML naming convention

← Previous Next →

Python runtime loading...

Click "Run" to execute your code.