Inference for two-way tables

IMS, Ch. 18

Smith College

Apr 10, 2026

Inference for two-way tables

Recall: Inference for two proportions

Method null dist. sampling dist.
1: probability ? ?
2: simulation randomization test (centered at \(0\)) two bootstraps (centered at \(\hat{p}_1 - \hat{p}_2\))
3: normal approx. \(N(0, SE_{pool})\) \(N \left( \hat{p}_1 - \hat{p}_2, SE_{\hat{p}_1 - \hat{p}_2} \right)\)
  • See IMS, Chapter 17

Beyond the binary

  • Difference in proportions:
    • binary response
    • binary explanatory
  • Two-way tables:
    • categorical response
    • categorical explanatory
    • 🎊 either/both can have more than two levels!

Inference for two-way tables

Method null dist. sampling dist.
1: probability hypergeometric (Fisher’s exact test) NA
2: simulation permutation test (starting at \(0\)) NA
3: \(\chi^2\) approx. \(\chi^2 (k = d.f.)\) NA
  • See IMS, Chapters 18

R.A. Fisher

  • “a genius who almost single-handedly created the foundations for modern statistical science”

  • “the single most important figure in 20th century statistics”

  • a racist, eugenicist, and Nazi sympathizer

  • renaming of COPSS Award

How Eugenics Shaped Statistics

Statistical thinking and eugenicist thinking are, in fact, deeply intertwined, and many of the theoretical problems with methods like significance testing—first developed to identify racial differences—are remnants of their original purpose, to support eugenics.

The DREAM Act

library(tidyverse)
library(openintro)
dream |>
  janitor::tabyl(ideology, stance) |>
  janitor::adorn_totals(where = c("row", "col"))
     ideology  No Not sure Yes Total
 Conservative 151       35 186   372
      Liberal  52        9 114   175
     Moderate 161       28 174   363
        Total 364       72 474   910

Setup

  • \(H_0\): stance independent of ideology
  • \(H_A\): stance not independent of ideology
  • \(\alpha = 0.05\)
  • What is the test statistic???

Logic

  • If \(H_0\) is true, then joint probabilities equal product of marginal probabilities

  • If \(A, B\) are independent, then \(\Pr(A \cap B) = \Pr(A) \cdot \Pr(B)\)

  • Consider this test statistic:

\[ X^2 = \sum_{i,j} \frac{(observed_{ij} - expected_{ij})^2}{expected_{ij}} \]

Test statistic

library(infer)
X2 <- dream |>
  observe(stance ~ ideology, null = "independence", stat = "Chisq") |>
  pull(stat)
X2
X-squared 
 16.38749 

How to construct the null dist?

  • If \(H_0\) is true, then \(X^2\) would be 0

  • But also \(X^2 \geq 0\)

  • So what is the sampling distribution of \(X^2\)?

Null distribution

library(infer)
dream_null <- dream |>
  specify(stance ~ ideology) |>
  hypothesize(null = "independence") |>
  generate(1000, type = "permute") |>
  calculate(stat = "Chisq")
dream_plot <- dream_null |>
  ggplot(aes(x = stat)) +
  geom_density(fill = "darkgray") +
  geom_vline(xintercept = 0, linetype = 3) +
  geom_vline(xintercept = X2, linetype = 2)

Permutation test

dream_plot

Chi-squared test

  • Statisticians have shown that \(X^2 \sim \chi^2(k)\), where

  • \(k\) is the number of degrees of freedom

  • \(k\) = (number or levels in response - 1) \(\cdot\) (number of levels in explanatory - 1)

  • In this case, \(k = (3-1) \cdot (3-1) = 4\)

Null approximation

dream_plot +
  stat_function(fun = dchisq, args = list(df = 4), color = "red")

p-values

# permutation test
dream_null |>
  get_p_value(obs_stat = X2, direction = "right")
# A tibble: 1 Ă— 1
  p_value
    <dbl>
1   0.002
# chi-squared test
pchisq(q = X2, df = 4, lower.tail = FALSE)
  X-squared 
0.002540935 
  • There is a statistically discernible association between stance on the DREAM Act and political ideology

Your turn

  • See handout