Bootstrap Resampling
Resampling to Estimate Uncertainty
Bootstrap resampling is a powerful technique to estimate the uncertainty of any statistic — no mathematical formulas required.
The idea: repeatedly sample your data with replacement, compute your statistic each time, and look at the distribution of results.
import random
data = [1, 2, 3, 4, 5]
rng = random.Random(42)
# Draw 1000 bootstrap samples and compute means
boot_means = [
sum(rng.choices(data, k=len(data))) / len(data)
for _ in range(1000)
]
boot_means.sort()
# 95% bootstrap confidence interval (percentile method)
def pct(p):
i = (p / 100) * (len(boot_means) - 1)
lo = int(i)
hi = min(lo + 1, len(boot_means) - 1)
return boot_means[lo] + (i - lo) * (boot_means[hi] - boot_means[lo])
print(round(pct(2.5), 1), round(pct(97.5), 1))
Why Bootstrap?
- Works for any statistic (median, , correlation, custom metrics)
- No assumptions about the underlying distribution
- Especially useful for small samples or unusual statistics
With Replacement
Sampling with replacement means each draw comes from the full original dataset. Some values appear multiple times, others don't appear. This mimics the randomness of collecting new samples.
Percentile Method
The percentile method extracts the CI directly from the bootstrap distribution:
- Lower bound: 2.5th percentile of bootstrap means
- Upper bound: 97.5th percentile of bootstrap means
The 95% bootstrap CI is [hat{ heta}_{2.5%},, hat{ heta}_{97.5%}], where is the statistic computed on each bootstrap sample.
Your Task
Implement bootstrap_ci(data, n_samples, seed) that returns a tuple (lower, upper) representing the 95% bootstrap confidence interval of the mean, rounded to 2 decimal places.