Lesson 10 of 15
Huffman Code Lengths
Huffman Code Lengths
Huffman coding is a lossless data compression scheme that assigns shorter codes to more frequent symbols and longer codes to rarer ones.
Optimal Code Length Formula
For a symbol with probability , the optimal code length (in bits) is:
A simplified but instructive greedy approximation uses the floor:
The minimum of 1 bit ensures every symbol gets at least one bit.
Average Code Length
Once lengths are assigned, the average code length is:
Shannon's Source Coding Theorem
For any uniquely decodable code:
Huffman coding achieves this bound: the average length is within 1 bit of the entropy.
Example
For probabilities :
- Lengths:
- Average: bits
- Entropy: bits (exactly matched here!)
import math
def huffman_code_lengths(probs):
return [max(1, math.floor(-math.log2(p))) for p in probs]
def average_code_length(probs, lengths):
return sum(p * l for p, l in zip(probs, lengths))
probs = [0.5, 0.25, 0.125, 0.125]
lengths = huffman_code_lengths(probs)
print(lengths) # [1, 2, 3, 3]
print(average_code_length(probs, lengths)) # 1.75
Your Task
Implement:
huffman_code_lengths(probs)— returns list of lengths usingaverage_code_length(probs, lengths)— returns
Python runtime loading...
Loading...
Click "Run" to execute your code.