Inference for a single mean

IMS, Ch. 19

Smith College

Apr 15, 2026

Inference for a single mean

Method	null dist.	sampling dist.
1: probability	??	??
2: simulation	??	bootstrap (centered at \(\bar{x}\))
3: \(t\)-approx.	\(\frac{\bar{x} - \mu_0}{s / \sqrt{n}} \sim t(d.f.)\)	\(\frac{\bar{x}}{s / \sqrt{n}} \sim t (d.f. )\)

See IMS, Chapter 19

UCLA book prices

A sample of courses were collected from UCLA from Fall 2018, and the corresponding textbook prices were collected from the UCLA bookstore and also from Amazon.

library(tidyverse)
library(openintro)
books_w_prices <- ucla_textbooks_f18 |>
  filter(!is.na(bookstore_new))

Note

Find a 95% confidence interval for the mean price of a textbook at the UCLA bookstore

Observed statistics

obs <- books_w_prices |>
  summarize(
    num_books = n(),
    mean_price = mean(bookstore_new),
    median_price = median(bookstore_new),
    sd_price = sd(bookstore_new)
  )
obs

# A tibble: 1 × 4
  num_books mean_price median_price sd_price
      <int>      <dbl>        <dbl>    <dbl>
1        89       82.1         55.5     70.0

Distribution of book prices

ggplot(books_w_prices, aes(x = bookstore_new)) +
  geom_density(fill = "lightgray") +
  geom_vline(xintercept = obs$median_price, linetype = 3) +
  geom_vline(xintercept = obs$mean_price, linetype = 3)

Method 2: Sampling distribution via bootstrap

Bootstrap distribution for mean

library(infer)
book_mean_bstrap <- books_w_prices |>
  specify(response = bookstore_new) |>
  generate(2000, type = "bootstrap") |>
  calculate(stat = "mean")

mean_ci <- book_mean_bstrap |>
  get_ci()
mean_ci

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     67.4     97.4

Confidence interval for the mean

book_mean_bstrap |>
  visualize() +
  shade_ci(mean_ci)

Bootstrap distribution for median!

library(infer)
book_median_bstrap <- books_w_prices |>
  specify(response = bookstore_new) |>
  generate(2000, type = "bootstrap") |>
  calculate(stat = "median")

median_ci <- book_median_bstrap |>
  get_ci()
median_ci

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     39.9     90.0

Confidence interval for a median

book_median_bstrap |>
  visualize() +
  shade_ci(median_ci)

Note

We have no Central Limit Theorem for medians!
But the bootstrap should work anyway!

Method 3: Sampling distribution via approximation

Recall for a proportion

Let \(X \sim Bernoulli(p)\)
Then the sample proportion of \(n\) instances of \(X\) is: \[ \hat{p} = \frac{X_1 + \cdots + X_n}{n} \]
Binomial theory implies \[ Var(\hat{p}) = \frac{\hat{p} (1 - \hat{p})}{n} = \frac{Var(X)}{n} \]
So variance of sample proportion (a.k.a., the square of the standard error) is equal to variance of underlying r.v., divided by sample size

Now for a mean

Let \(X\) be a any r.v. Then, the sample mean of i.i.d. r.v.’s \(X_1, \ldots , X_n\) is: \[ \bar{x} = \frac{X_1 + \cdots + X_n}{n} \]
It follows that \[ Var(\bar{x}) = \frac{1}{n^2} \cdot n \cdot Var(X) = \frac{Var(X)}{n} \]
So variance of sample mean (a.k.a., the square of the standard error) is equal to variance of underlying r.v., divided by sample size

CLT

CLT implies that sampling distribution of \(\bar{x}\) approaches normal as \(n \rightarrow \infty\)
But we don’t know \(Var(X) \equiv \sigma_X^2\)!!
We have to estimate \(\sigma_X\) with \(s_X\) (the sample standard deviation)
This introduces extra uncertainty
Statisticians have shown that this extra uncertainty is captured by the \(t\)-distribution (rather than the Normal)

\(t\)-distribution

Like the standard normal but with fatter tails
1 parameter: d.f.

t_plot <- ggplot() +
  stat_function(fun = dnorm, fill = "darkgray") +
  lims(x = c(-4, 4)) +
  stat_function(fun = dt, args = list(df = 2), color = "red") + 
  stat_function(fun = dt, args = list(df = 4), color = "orange") +
  stat_function(fun = dt, args = list(df = 8), color = "yellow") + 
  stat_function(fun = dt, args = list(df = 16), color = "green") +
  stat_function(fun = dt, args = list(df = 32), color = "blue") + 
  stat_function(fun = dt, args = list(df = 64), color = "purple")

It looks like this

t_plot

How do we use it?

\(t^*_{\alpha/2}\):

qt(0.975, df = 10)

[1] 2.228139

# compare with z*
qnorm(0.975)

[1] 1.959964

Margin of error: \(t^*_{\alpha/2} \cdot SE_{\bar{x}}\)

Confidence Interval: \(\bar{x} \pm t^*_{\alpha/2} \cdot SE_{\bar{x}}\)

Book prices

Standard error

se_book <- obs$sd_price / sqrt(obs$num_books)
se_book

[1] 7.422503

Compare with bootstrap SE

book_mean_bstrap |>
  summarize(se_bstrap = sd(stat))

# A tibble: 1 × 1
  se_bstrap
      <dbl>
1      7.46

Book prices

Confidence interval

moe <- qt(0.975, df = obs$num_books - 1) * se_book
moe

[1] 14.75066

obs$mean_price + c(-moe, moe)

[1] 67.39214 96.89347

Compare with bootstrap CI

mean_ci

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     67.4     97.4

Your turn

See handout