| race | two_year_recid | risk_score | n |
|---|---|---|---|
| African-American | 0 | High | 345 |
| African-American | 0 | Low | 1169 |
| African-American | 1 | High | 843 |
| African-American | 1 | Low | 818 |
| Caucasian | 0 | High | 106 |
| Caucasian | 0 | Low | 1175 |
| Caucasian | 1 | High | 230 |
| Caucasian | 1 | Low | 592 |
Who’s the Fairest of Them All?
Smith College
Sep 27, 2025
| race | two_year_recid | risk_score | n |
|---|---|---|---|
| African-American | 0 | High | 345 |
| African-American | 0 | Low | 1169 |
| African-American | 1 | High | 843 |
| African-American | 1 | Low | 818 |
| Caucasian | 0 | High | 106 |
| Caucasian | 0 | Low | 1175 |
| Caucasian | 1 | High | 230 |
| Caucasian | 1 | Low | 592 |
Prediction fails differently for Black Defendants
| Error Type | White | African-American |
|---|---|---|
| Labeled Higher Risk, But Didn't Re-Offend | 23.5 | 44.9 |
| labeled Lower Risk, Yet Did Re-Offend | 47.7 | 28.0 |
\[ \Pr{(\neg Y | \hat{Y}, W)} = 0.409 \approx \Pr{(\neg Y | \hat{Y}, B)} = 0.370 \\ \Pr{(Y | \neg \hat{Y}, W)} = 0.288 \approx \Pr{(Y | \neg \hat{Y}, B)} = 0.448 \]
“Dozens” of statistical criteria for fairness boil down to 3:
\[ \left| \Pr(\hat{Y} | W) - \Pr(\hat{Y} | B) \right| < \epsilon \,, \] where \(\epsilon\) is a small positive constant (typically \(\epsilon < 0.2\))
\[ \left| 0.348- 0.588 \right| = 0.210 > \epsilon \,, \]
\[ \left| \Pr(\hat{Y} | Y, W) - \Pr(\hat{Y} | Y, B) \right| < \epsilon \\ \left| \Pr(\hat{Y} | \neg Y, W) - \Pr(\hat{Y} | \neg Y, B) \right| < \epsilon \]
\[ \left| 0.523 - 0.720 \right| = 0.197 < \epsilon \\ \left| 0.235 - 0.488 \right| = 0.253 > \epsilon \]
\[ \left| \Pr(Y | \hat{Y}, W) - \Pr(Y | \hat{Y}, B) \right| < \epsilon \,, \]
\[ \left| 0.591 - 0.630 \right| = 0.039 < \epsilon \]
Unless the base rates are the same across the groups, you can’t satisfy all three criteria simultaneously (Kleinberg, Mullainathan, and Raghavan 2016)
faireRyardstick and mlr3fairnesstidyverse-friendly

faireRPeople with similar qualifications should be treated similarly
People of equal ability and ambition should have similar chances
Adjust for past injustice (that caused the differences in qualifications) at the time of opportunity
Instead of:
two equal pairs should have as close to an equal chance of winning as possible. (Pollard, Noble, and Pollard 2022)
Instead of:
A swimmer with no arms should be able to compete against a swimmer with one or two arms…If two athletes with the same disability compete, the more skilled and fitter should win. (Bartneck and Moltchanova 2024)
Instead of:
We present two methods to distribute prize money across gender based on the individual performances w.r.t. gender-specific records. We suppose these “across gender distributions” to be fair, as they suitably respect that women generally are slower than men. (Martens and Starflinger 2022)
# A tibble: 1 × 3
independence separation sufficiency
<dbl> <dbl> <dbl>
1 0.00322 0.00520 0.0323
✅
hof2025 |>
mutate(
is_pitcher = tSO > 100,
is_reliever = tSV > 50
) |>
filter(is_pitcher) |>
group_by(is_reliever) |>
fairness_cube()# A tibble: 1 × 3
independence separation sufficiency
<dbl> <dbl> <dbl>
1 0.00446 0.275 0.0821
❌
Award prizes to:
# A tibble: 1 × 3
independence separation sufficiency
<dbl> <dbl> <dbl>
1 0.101 NA NA