Path: blob/main/notebooks/published/birthday_paradox/birthday_paradox.ipynb
51 views
The Birthday Paradox: A Probabilistic Analysis
Introduction
The Birthday Paradox (also known as the Birthday Problem) is a classic problem in probability theory that demonstrates how human intuition often fails when dealing with probabilities. The paradox states that in a group of just 23 people, there is a greater than 50% probability that at least two people share the same birthday.
This counterintuitive result arises because we tend to think about the probability of sharing our own birthday with someone else, rather than considering all possible pairs of people in the group.
Mathematical Framework
Assumptions
We make the following simplifying assumptions:
There are exactly 365 days in a year (ignoring leap years)
All birthdays are equally likely (uniform distribution)
Birthdays are independent events
Analytical Solution
Let denote the probability that at least two people in a group of share a birthday. It is easier to compute the complement , the probability that all people have different birthdays.
For the first person, any birthday is valid:
For the second person, they must avoid the first person's birthday:
For the third person:
Continuing this pattern, the probability that all people have different birthdays is:
This can be written more compactly as:
Therefore, the probability of at least one shared birthday is:
Approximation Using Taylor Series
For small , we have . Using this approximation:
Thus:
Critical Threshold
Setting and solving for :
Solving the quadratic gives , which is the famous result.
Analytical Computation
We first compute the exact probability using the analytical formula derived above.
Monte Carlo Simulation
To verify our analytical results, we perform a Monte Carlo simulation. For each group size, we generate many random groups and count how often at least two people share a birthday.
Visualization
We now create a comprehensive visualization showing:
The exact analytical probability
The exponential approximation
Monte Carlo simulation results
Key probability thresholds
Analysis of Results
Key Findings
The 50% threshold occurs at n = 23: With just 23 people, there is a 50.73% probability that at least two share a birthday.
Rapid growth: The probability grows quickly:
P(10) ≈ 11.7%
P(23) ≈ 50.7%
P(50) ≈ 97.0%
P(70) ≈ 99.9%
Why the paradox works: The key insight is the number of pairs. With 23 people, there are unique pairs to compare. Each pair has a 1/365 chance of matching, and with 253 independent comparisons, matches become likely.
Approximation accuracy: The exponential approximation is remarkably accurate, with maximum error around 1.5%.
Generalizations
The birthday paradox generalizes to collision problems in computer science:
Hash collisions: In a hash table with slots, expect collisions after approximately insertions
Cryptographic attacks: Birthday attacks on hash functions require only operations instead of
Conclusion
The Birthday Paradox elegantly demonstrates that human intuition about probability can be misleading. The counterintuitive result—that only 23 people are needed for a >50% chance of a shared birthday—arises from the combinatorial explosion of pairwise comparisons.
This principle has profound implications in:
Cryptography: Designing hash functions resistant to birthday attacks
Statistics: Understanding collision probabilities in sampling
Computer Science: Analyzing hash table performance
The mathematical framework presented here—using both exact calculation and exponential approximation—provides tools for analyzing similar collision problems across many domains.