Lesson 12 of 15

Birthday Paradox & Hash Collisions

Birthday Paradox and Hash Collisions

How many people must be in a room before two of them share a birthday? Surprisingly, only 23 people are needed for a >50% chance. This counter-intuitive result — the birthday paradox — directly determines the security of hash functions.

Birthday Attack on Hash Functions

If a hash function produces nn-bit outputs (2n2^n possible values), how many random inputs do we need before finding two with the same hash?

The birthday paradox gives us:

P(collision after k hashes)1ek(k1)/(22n)P(\text{collision after } k \text{ hashes}) \approx 1 - e^{-k(k-1)/(2 \cdot 2^n)}

For a 50% collision probability, the threshold is approximately:

k22nln21.1772n/2k \approx \sqrt{2 \cdot 2^n \cdot \ln 2} \approx 1.177 \cdot 2^{n/2}

Implications for Hash Security

Hash sizeCollision thresholdSecurity level
32 bits~77,163Toy — trivially broken
64 bits~5 billionWeak — birthday in seconds
128 bits~2642^{64}MD5 range — compromised
256 bits~21282^{128}SHA-256 — considered secure

Rule of thumb: an nn-bit hash provides only n/2n/2 bits of collision resistance.

Your Task

Implement:

  • birthday_probability(n, hash_bits) — probability of at least one collision among nn hashes of size hash_bits bits: 1en(n1)/(22hash_bits)1 - e^{-n(n-1)/(2 \cdot 2^{\text{hash\_bits}})}
  • collision_threshold(hash_bits, probability=0.5) — minimum nn for collision probability p\geq p: 22hash_bitsln(1/(1p))\lceil \sqrt{2 \cdot 2^{\text{hash\_bits}} \cdot \ln(1/(1-p))} \rceil

Use import math for math.exp, math.log, math.sqrt, math.ceil.

Python runtime loading...
Loading...
Click "Run" to execute your code.