Sampling Distributions

IMS, Ch. 12

Benjamin S. Baumer

Smith College

Mar 6, 2026

Distribution of random variables

Let’s make up some random variables

Recall a r.v. \(X: S \rightarrow \mathbb{R}\)

Roll the 🎲 10 times
\(n = 10\) is our sample size
Record in our spreadsheet:
1. \(X_{10}\): the mean number of pips
2. \(Y_{10}\): the median number of pips
3. \(Z_{10}\): the number of rolls with an even number of pips
4. \(W_{10}\): the number of sixes

Read data from Google Sheet

library(tidyverse)
library(googlesheets4)
gs4_deauth()
dice <- read_sheet("https://docs.google.com/spreadsheets/d/1FvmWcm0ObURhJl9KhwNIM5sKMsbyFoF1Wie7fU6P_Sk/") |>
  mutate(id = row_number())
dice

# A tibble: 35 × 6
       X     Y     Z     W color    id
   <dbl> <dbl> <dbl> <dbl> <chr> <int>
 1   6     6      10    10 black     1
 2   6     6      10    10 black     2
 3   5.7   6       9     9 black     3
 4   3.6   3.5     4     1 white     4
 5   5.1   6       9     7 black     5
 6   5.9   6       9     9 black     6
 7   5.7   6       9     9 black     7
 8   6     6      10    10 black     8
 9   5.6   6      10     9 black     9
10   3.2   4       5     2 white    10
# ℹ 25 more rows

Compute point estimates

dice |>
  group_by(color) |>
  summarize(
    n = n(),
    x_bar = mean(X),
    y_bar = mean(Y),
    z_bar = mean(Z),
    w_bar = mean(W),
  )

# A tibble: 2 × 6
  color     n x_bar y_bar z_bar w_bar
  <chr> <int> <dbl> <dbl> <dbl> <dbl>
1 black    23  5.57  5.91  9.48  8.30
2 white    12  3.54  3.71  4.83  1.92

\(\bar{x}\) is our best estimate of \(\mathbb{E}[X_{10}] = \mu_{X_{10}}\)

Key idea

Every random variable has a distribution
Every random variable has an expected value
- (possibly infinite)
Every random variable has a variance
- (possibly infinite)
What is the distribution of \(X_{10}\)? \(Y_{10}\)? \(Z_{10}\)?

View distribution of r.v.’s

dice |>
  pivot_longer(-c(color, id), names_to = "r_v", values_to = "value") |>
ggplot(aes(x = value, fill = color)) +
  geom_histogram() +
  facet_wrap(vars(r_v))

Your turn

What do you observe about the distribution of \(X_{10}\)?
- Center, shape, and spread?
Are the black and white 🎲 different?
- How do you know?
What do you suspect the distribution of \(X_{20}\) looks like?

Segue

The distribution of \(X_{10}\) is a sampling distribution

Sampling distributions

For the white dice

We’ve observed \(X_{10}\) 12 times
The sample mean is still our best point estimate
- \(\bar{x}\) = 3.5416667
What can we say about our uncertainty around that point estimate?

Sampling distribution

The bootstrap

What if we resample from our sample?

The bootstrap

Developed by Brad Efron in 1979
Resample from your sample with replacement!
Bootstrap distribution is similar to sampling distribution

In practice

library(infer)
pips10_bstrap <- dice |>
  filter(color == "white") |>
  specify(response = X) |>
  generate(1000, type = "bootstrap") |>
  calculate(stat = "mean")
pips10_bstrap

Response: X (numeric)
# A tibble: 1,000 × 2
   replicate  stat
       <int> <dbl>
 1         1  3.51
 2         2  3.86
 3         3  3.52
 4         4  3.63
 5         5  3.7 
 6         6  3.47
 7         7  3.5 
 8         8  3.22
 9         9  3.73
10        10  3.66
# ℹ 990 more rows

A confidence interval

ci <- pips10_bstrap |>
  get_ci()
ci

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     3.25     3.89

The bootstrap distribution

pips10_bstrap |>
  visualize() +
  shade_ci(ci) + 
  geom_vline(xintercept = 3.5, linetype = 3)

Characteristics of boostrap dist.

pips10_bstrap |>
  summarize(
    num_replicates = n(),
    mean = mean(stat),
    var = var(stat),
    pct025 = quantile(stat, 0.025),
    pct975 = quantile(stat, 0.975)
  )

# A tibble: 1 × 5
  num_replicates  mean    var pct025 pct975
           <int> <dbl>  <dbl>  <dbl>  <dbl>
1           1000  3.55 0.0265   3.25   3.89

Variance of bootstrap distribution is excellent estimate of variance of sampling distribution