Inference for a single mean

IMS, Ch. 19

Smith College

Apr 15, 2026

Inference for a single mean

Method null dist. sampling dist.
1: probability ?? ??
2: simulation ?? bootstrap (centered at \(\bar{x}\))
3: \(t\)-approx. \(\frac{\bar{x} - \mu_0}{s / \sqrt{n}} \sim t(d.f.)\) \(\frac{\bar{x}}{s / \sqrt{n}} \sim t (d.f. )\)
  • See IMS, Chapter 19

UCLA book prices

A sample of courses were collected from UCLA from Fall 2018, and the corresponding textbook prices were collected from the UCLA bookstore and also from Amazon.

library(tidyverse)
library(openintro)
books_w_prices <- ucla_textbooks_f18 |>
  filter(!is.na(bookstore_new)) 

Note

  • Find a 95% confidence interval for the mean price of a textbook at the UCLA bookstore

Observed statistics

obs <- books_w_prices |>
  summarize(
    num_books = n(),
    mean_price = mean(bookstore_new),
    median_price = median(bookstore_new),
    sd_price = sd(bookstore_new)
  )
obs
# A tibble: 1 × 4
  num_books mean_price median_price sd_price
      <int>      <dbl>        <dbl>    <dbl>
1        89       82.1         55.5     70.0

Distribution of book prices

ggplot(books_w_prices, aes(x = bookstore_new)) +
  geom_density(fill = "lightgray") +
  geom_vline(xintercept = obs$median_price, linetype = 3) +
  geom_vline(xintercept = obs$mean_price, linetype = 3)

Method 2: Sampling distribution via bootstrap

Bootstrap distribution for mean

library(infer)
book_mean_bstrap <- books_w_prices |>
  specify(response = bookstore_new) |>
  generate(2000, type = "bootstrap") |>
  calculate(stat = "mean")
mean_ci <- book_mean_bstrap |>
  get_ci()
mean_ci
# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     67.4     97.4

Confidence interval for the mean

book_mean_bstrap |>
  visualize() +
  shade_ci(mean_ci)

Bootstrap distribution for median!

library(infer)
book_median_bstrap <- books_w_prices |>
  specify(response = bookstore_new) |>
  generate(2000, type = "bootstrap") |>
  calculate(stat = "median")

median_ci <- book_median_bstrap |>
  get_ci()
median_ci
# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     39.9     90.0

Confidence interval for a median

book_median_bstrap |>
  visualize() +
  shade_ci(median_ci)

Note

  • We have no Central Limit Theorem for medians!

  • But the bootstrap should work anyway!

Method 3: Sampling distribution via approximation

Recall for a proportion

  • Let \(X \sim Bernoulli(p)\)

  • Then the sample proportion of \(n\) instances of \(X\) is: \[ \hat{p} = \frac{X_1 + \cdots + X_n}{n} \]

  • Binomial theory implies \[ Var(\hat{p}) = \frac{\hat{p} (1 - \hat{p})}{n} = \frac{Var(X)}{n} \]

  • So variance of sample proportion (a.k.a., the square of the standard error) is equal to variance of underlying r.v., divided by sample size

Now for a mean

  • Let \(X\) be a any r.v. Then, the sample mean of i.i.d. r.v.’s \(X_1, \ldots , X_n\) is: \[ \bar{x} = \frac{X_1 + \cdots + X_n}{n} \]

  • It follows that \[ Var(\bar{x}) = \frac{1}{n^2} \cdot n \cdot Var(X) = \frac{Var(X)}{n} \]

  • So variance of sample mean (a.k.a., the square of the standard error) is equal to variance of underlying r.v., divided by sample size

CLT

  • CLT implies that sampling distribution of \(\bar{x}\) approaches normal as \(n \rightarrow \infty\)

  • But we don’t know \(Var(X) \equiv \sigma_X^2\)!!

  • We have to estimate \(\sigma_X\) with \(s_X\) (the sample standard deviation)

  • This introduces extra uncertainty

  • Statisticians have shown that this extra uncertainty is captured by the \(t\)-distribution (rather than the Normal)

\(t\)-distribution

  • Like the standard normal but with fatter tails
  • 1 parameter: d.f.
t_plot <- ggplot() +
  stat_function(fun = dnorm, fill = "darkgray") +
  lims(x = c(-4, 4)) +
  stat_function(fun = dt, args = list(df = 2), color = "red") + 
  stat_function(fun = dt, args = list(df = 4), color = "orange") +
  stat_function(fun = dt, args = list(df = 8), color = "yellow") + 
  stat_function(fun = dt, args = list(df = 16), color = "green") +
  stat_function(fun = dt, args = list(df = 32), color = "blue") + 
  stat_function(fun = dt, args = list(df = 64), color = "purple")

It looks like this

t_plot

How do we use it?

  • \(t^*_{\alpha/2}\):
qt(0.975, df = 10)
[1] 2.228139
# compare with z*
qnorm(0.975)
[1] 1.959964
  • Margin of error: \(t^*_{\alpha/2} \cdot SE_{\bar{x}}\)
  • Confidence Interval: \(\bar{x} \pm t^*_{\alpha/2} \cdot SE_{\bar{x}}\)

Book prices

  • Standard error
se_book <- obs$sd_price / sqrt(obs$num_books)
se_book
[1] 7.422503
  • Compare with bootstrap SE
book_mean_bstrap |>
  summarize(se_bstrap = sd(stat))
# A tibble: 1 × 1
  se_bstrap
      <dbl>
1      7.46

Book prices

  • Confidence interval
moe <- qt(0.975, df = obs$num_books - 1) * se_book
moe
[1] 14.75066
obs$mean_price + c(-moe, moe)
[1] 67.39214 96.89347
  • Compare with bootstrap CI
mean_ci
# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     67.4     97.4

Your turn

See handout