Sampling Distributions

IMS, Ch. 12

Smith College

Mar 6, 2026

Distribution of random variables

Let’s make up some random variables

Recall a r.v. \(X: S \rightarrow \mathbb{R}\)

  • Roll the 🎲 10 times
  • \(n = 10\) is our sample size
  • Record in our spreadsheet:
    1. \(X_{10}\): the mean number of pips
    2. \(Y_{10}\): the median number of pips
    3. \(Z_{10}\): the number of rolls with an even number of pips
    4. \(W_{10}\): the number of sixes

Read data from Google Sheet

library(tidyverse)
library(googlesheets4)
gs4_deauth()
dice <- read_sheet("https://docs.google.com/spreadsheets/d/1FvmWcm0ObURhJl9KhwNIM5sKMsbyFoF1Wie7fU6P_Sk/") |>
  mutate(id = row_number())
dice
# A tibble: 35 × 6
       X     Y     Z     W color    id
   <dbl> <dbl> <dbl> <dbl> <chr> <int>
 1   6     6      10    10 black     1
 2   6     6      10    10 black     2
 3   5.7   6       9     9 black     3
 4   3.6   3.5     4     1 white     4
 5   5.1   6       9     7 black     5
 6   5.9   6       9     9 black     6
 7   5.7   6       9     9 black     7
 8   6     6      10    10 black     8
 9   5.6   6      10     9 black     9
10   3.2   4       5     2 white    10
# ℹ 25 more rows

Compute point estimates

dice |>
  group_by(color) |>
  summarize(
    n = n(),
    x_bar = mean(X),
    y_bar = mean(Y),
    z_bar = mean(Z),
    w_bar = mean(W),
  )
# A tibble: 2 × 6
  color     n x_bar y_bar z_bar w_bar
  <chr> <int> <dbl> <dbl> <dbl> <dbl>
1 black    23  5.57  5.91  9.48  8.30
2 white    12  3.54  3.71  4.83  1.92
  • \(\bar{x}\) is our best estimate of \(\mathbb{E}[X_{10}] = \mu_{X_{10}}\)

Key idea

  • Every random variable has a distribution

  • Every random variable has an expected value

    • (possibly infinite)
  • Every random variable has a variance

    • (possibly infinite)
  • What is the distribution of \(X_{10}\)? \(Y_{10}\)? \(Z_{10}\)?

View distribution of r.v.’s

dice |>
  pivot_longer(-c(color, id), names_to = "r_v", values_to = "value") |>
ggplot(aes(x = value, fill = color)) +
  geom_histogram() +
  facet_wrap(vars(r_v))

Your turn

  • What do you observe about the distribution of \(X_{10}\)?
    • Center, shape, and spread?
  • Are the black and white 🎲 different?
    • How do you know?
  • What do you suspect the distribution of \(X_{20}\) looks like?

Segue

The distribution of \(X_{10}\) is a sampling distribution

Sampling distributions

For the white dice

  • We’ve observed \(X_{10}\) 12 times
  • The sample mean is still our best point estimate
    • \(\bar{x}\) = 3.5416667
  • What can we say about our uncertainty around that point estimate?

Sampling distribution

The bootstrap

What if we resample from our sample?

The bootstrap

  • Developed by Brad Efron in 1979

  • Resample from your sample with replacement!

  • Bootstrap distribution is similar to sampling distribution

In practice

library(infer)
pips10_bstrap <- dice |>
  filter(color == "white") |>
  specify(response = X) |>
  generate(1000, type = "bootstrap") |>
  calculate(stat = "mean")
pips10_bstrap
Response: X (numeric)
# A tibble: 1,000 × 2
   replicate  stat
       <int> <dbl>
 1         1  3.51
 2         2  3.86
 3         3  3.52
 4         4  3.63
 5         5  3.7 
 6         6  3.47
 7         7  3.5 
 8         8  3.22
 9         9  3.73
10        10  3.66
# ℹ 990 more rows

A confidence interval

ci <- pips10_bstrap |>
  get_ci()
ci
# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     3.25     3.89

The bootstrap distribution

pips10_bstrap |>
  visualize() +
  shade_ci(ci) + 
  geom_vline(xintercept = 3.5, linetype = 3)

Characteristics of boostrap dist.

pips10_bstrap |>
  summarize(
    num_replicates = n(),
    mean = mean(stat),
    var = var(stat),
    pct025 = quantile(stat, 0.025),
    pct975 = quantile(stat, 0.975)
  )
# A tibble: 1 × 5
  num_replicates  mean    var pct025 pct975
           <int> <dbl>  <dbl>  <dbl>  <dbl>
1           1000  3.55 0.0265   3.25   3.89
  • Variance of bootstrap distribution is excellent estimate of variance of sampling distribution