A brief history of sabermetrics

SDS 355

Prof. Baumer

September 10, 2025

What is sabermetrics?

Sabermetrics

  • statistics
  • computing
  • math
  • baseball
  • physics

A brief history of sabermetrics

Early 20th century

Early 20th century: data

  • Boxscores
    • game level
    • dozens of rows

Early 20th century: tools

Early 20th century: methods

Early 20th century: what did we learn?

  • Batting
    • batting average
  • Pitching:
    • earned run average (ERA)
  • Fielding
    • fielding percentage

Branch Rickey era

Branch Rickey era: data

Branch Rickey era: tools

Branch Rickey era: methods

Roth on math

The figures concerned in baseball statistical work don’t call for integral calculus or even advanced algebra.

–Allan Roth

What did we learn?

Bill James era

Bill James era: data

  • Compiled:
    • seasonal, some batter vs. pitcher
    • thousands of rows
  • 1969: “Big Mac”

Bill James era: tools

Bill James era: methods

Bill James era: what did we learn?

  • Pythagorean Expectation
  • Win Shares
  • Batting
    • Runs Created
  • Pitching
    • Game Score
  • Fielding
    • Range Factor

Internet era

Internet era: data

  • Seasonal
    • player-team-season
    • 100,000 rows seasonal
  • 1995: Sean Lahman publishes first online database

Internet era: tools

Internet era: methods

  • correlation
  • regression
  • Markov chains

Internet era: what did we learn?

There is little if any difference among major-league pitchers in their ability to prevent hits on balls hit in the field of play.

–Voros McCracken

Moneyball era: data

  • Play-by-play
    • batter vs. pitcher
    • several million rows
  • 1996: Retrosheet starts publishing play-by-play data online

Moneyball era: tools

play,1,0,hendr001,00,X,9/F9S
play,1,0,whitd001,22,BBFFC,K/C
play,1,0,alomr001,32,BBCFBB,W
play,1,0,cartj001,00,X,4/P4
play,1,1,raint001,00,X,43/G34
play,1,1,coraj001,01,CX,4/P4MD
play,1,1,thomf001,30,BBBB,W
play,1,1,ventr001,00,B,WP.1-2
play,1,1,ventr001,11,B.CB,WP.2-3
play,1,1,ventr001,32,B.CB.SFFBS,K

Moneyball era: methods

  • regression
  • clustering
  • \(k\)-Nearest Neighbor

Moneyball era: what did we learn?

  • Batting
    • Linear Weights
    • Value Over Replacement Player
    • Projection systems
  • Pitching
    • Leverage
  • Fielding

PITCHf/x era: data

  • Pitch-by-pitch
    • each pitch
    • several million rows per season

PITCHf/x era: tools

  • 2000s: SQL can handle tens of millions of rows on modest hardware
  • 2007: Excel 12.0 can handle at most 1M rows

PITCHf/x era: methods

  • mixed models
  • Bayesian methods
  • generalized linear models

PITCHf/x era: what did we learn?

  • WAR
  • Pitching
    • Spin rates
    • Catcher framing

Statcast era: data

  • 2015: Statcast deployed
  • Player-tracking
    • every 1/15th of a second
    • several million rows per game

Statcast era: tools

Statcast era: methods

  • machine learning
    • regularized regression
    • decision trees
    • random forests
    • neural networks
    • support vector machines

Statcast era: what have we learned?

  • Batting:
    • exit velocity
    • launch angle
  • Fielding:
    • route efficiency
    • pop time

Summary

data rows tool method result
boxscores dozens paper arithmetic rates
tabulated hundreds adding machine arithmetic formulas
compiled thousands calculator algebra metrics
seasonal 100 k spreadsheet regression metrics
play-by-play 1 M SQL (laptop) regression fitted models
pitch-by-pitch 10 M SQL (server) mixed, Bayesian statistical models
player tracking billions Spark ML ?

Baseball data

Lahman

  • Seasonal data, 1871-2025
library(Lahman)
Teams |>
  head()
  yearID lgID teamID franchID divID Rank   G Ghome  W  L DivWin WCWin LgWin
1   1884   UA    ALT      ALT  <NA>   10  25    NA  6 19   <NA>  <NA>     N
2   1961   AL    LAA      ANA  <NA>    8 162    82 70 91   <NA>  <NA>     N
3   1962   AL    LAA      ANA  <NA>    3 162    81 86 76   <NA>  <NA>     N
4   1963   AL    LAA      ANA  <NA>    9 161    81 70 91   <NA>  <NA>     N
5   1964   AL    LAA      ANA  <NA>    5 162    81 82 80   <NA>  <NA>     N
6   1965   AL    CAL      ANA  <NA>    7 162    80 75 87   <NA>  <NA>     N
  WSWin   R   AB    H X2B X3B  HR  BB   SO  SB CS HBP SF  RA  ER  ERA CG SHO SV
1  <NA>  90  899  223  30   6   2  22  130  NA NA  NA NA 216 114 4.67 20   0  0
2     N 744 5424 1331 218  22 189 681 1068  37 28  NA NA 784 689 4.31 25   5 34
3     N 718 5499 1377 232  35 137 602  917  46 27  NA NA 706 603 3.70 23  15 47
4     N 597 5506 1378 208  38  95 448  916  43 30  NA NA 660 569 3.52 30  13 31
5     N 544 5362 1297 186  27 102 472  920  49 39  NA NA 551 469 2.91 30  28 41
6     N 527 5354 1279 200  36  92 443  973 107 59  NA NA 569 508 3.17 39  14 33
  IPouts   HA HRA BBA SOA   E  DP    FP                  name
1    659  292   3  52  93 156   4 0.862 Altoona Mountain City
2   4314 1391 180 713 973 192 154 0.969    Los Angeles Angels
3   4398 1412 118 616 858 175 153 0.972    Los Angeles Angels
4   4365 1317 120 578 889 163 155 0.974    Los Angeles Angels
5   4350 1273 100 530 965 138 168 0.978    Los Angeles Angels
6   4323 1259  91 563 847 123 149 0.981     California Angels
                park attendance BPF PPF teamIDBR teamIDlahman45 teamIDretro
1               <NA>         NA 101 109      ALT            ALT         ALT
2 Wrigley Field (LA)     603510 111 112      LAA            LAA         LAA
3     Dodger Stadium    1144063  97  97      LAA            LAA         LAA
4     Dodger Stadium     821015  94  94      LAA            LAA         LAA
5     Dodger Stadium     760439  90  90      LAA            LAA         LAA
6     Dodger Stadium     566727  97  98      CAL            CAL         CAL

Retrosheet

library(retrosheet)
get_retrosheet("game", 1986, "NYN") |>
  head()
        Date DblHdr Day VisTm VisTmLg VisTmGNum HmTm HmTmLg HmTmGNum VisRuns
1 1986-04-07      0 Mon   CLE      AL         1  BAL     AL        1       6
2 1986-04-07      0 Mon   MIL      AL         1  CHA     AL        1       5
3 1986-04-07      0 Mon   BOS      AL         1  DET     AL        1       5
4 1986-04-07      0 Mon   PHI      NL         1  CIN     NL        1       4
5 1986-04-07      0 Mon   SDN      NL         1  LAN     NL        1       1
6 1986-04-08      0 Tue   KCA      AL         1  NYA     AL        1       2
  HmRuns NumOuts DayNight Completion Forfeit Protest ParkID Attendance Duration
1      4      54        D       <NA>      NA    <NA>  BAL11      52292      184
2      3      54        D       <NA>      NA    <NA>  CHI10      42265      168
3      6      51        D       <NA>      NA    <NA>  DET04      51437      175
4      7      51        D       <NA>      NA    <NA>  CIN08      54960      141
5      2      51        D       <NA>      NA    <NA>  LOS03      49444      135
6      4      51        D       <NA>      NA    <NA>  NYC16      55602      146
    VisLine    HmLine VisAB VisH VisD VisT VisHR VisRBI VisSH VisSF VisHBP
1 122000010 000020110    34    9    3    0     0      5     0     1      0
2 010202000 010010001    35   10    1    0     2      5     0     0      0
3 101000300 00103020x    39   12    1    1     4      5     0     0      0
4 301000000 04102000x    32    7    4    0     1      4     1     1      0
5 000000001 01000010x    33   10    0    0     0      1     0     0      0
6 000200000 03001000x    33    5    1    0     1      2     0     0      1
  VisBB VisIBB VisK VisSB VisCS VisGDP VisCI VisLOB VisPs VisER VisTER VisWP
1     5      0    3     0     1      0     0      7     3     2      2     1
2     0      0    7     1     0      2     0      3     3     3      3     0
3     1      0    6     0     0      0     0      8     4     6      6     0
4     3      1    2     1     0      1     0      6     3     7      7     0
5     1      0    9     1     2      1     0      6     2     2      2     0
6     3      0   10     0     0      0     0      8     2     4      4     0
  VisBalks VisPO VisA VisE VisPassed VisDB VisTP HmAB HmH HmD HmT HmHR HmRBI
1        0    27   11    3         0     1     0   35   8   2   0    0     2
2        0    27    9    0         0     0     0   34   9   2   0    0     3
3        0    24    4    0         0     0     0   35  13   1   1    2     6
4        0    24   16    0         0     2     0   35  13   2   0    2     7
5        0    24   12    0         0     1     0   28   5   1   0    1     2
6        0    24   11    1         0     2     0   30   8   0   0    1     4
  HmSH HmSF HmHBP HmBB HmIBB HmK HmSB HmCS HmGDP HmCI HmLOB HmPs HmER HmTER
1    0    1     0    2     0   5    1    0     1    0     7    3    6     6
2    0    2     0    5     0   6    1    0     0    0    11    4    5     5
3    0    0     0    3     0   6    2    1     0    0     8    3    5     5
4    0    0     0    2     0   1    1    1     1    0     6    2    1     1
5    0    0     0    2     0   0    0    0     0    0     4    1    1     1
6    0    0     0    2     0   3    2    1     1    0     4    3    2     2
  HmWP HmBalks HmPO HmA HmE HmPass HmDB HmTP   UmpHID        UmpHNm  Ump1BID
1    0       0   27  13   2      0    1    0 brinj901  Joe Brinkman coont901
2    0       0   27  11   0      0    2    0 mckej901    Jim McKean clara901
3    0       0   27  13   0      0    0    0 barnl901 Larry Barnett bremn901
4    0       0   27  12   1      0    1    0 kiblj901   John Kibler froeb901
5    0       0   27  15   1      0    1    0 willb901 Bill Williams mcshj901
6    0       0   27  10   1      0    0    0 garcr901   Rich Garcia koscg901
          Ump1BNm  Ump2BID       Ump2BNm  Ump3BID         Ump3BNm UmpLFID
1    Terry Cooney pales901 Steve Palermo kaisk901      Ken Kaiser      NA
2        Al Clark shulj901  John Shulock morrd901    Dan Morrison      NA
3   Nick Bremigan roe-r901     Rocky Roe hirsj901 John Hirschbeck      NA
4 Bruce Froemming marsr901   Randy Marsh davib902    Bob Davidson      NA
5   John McSherry pullf901   Frank Pulli westj901        Joe West      NA
6       Greg Kosc reedr901     Rick Reed fordd901       Dale Ford      NA
  UmpLFNm UmpRFID UmpRFNm VisMgrID         VisMgrNm  HmMgrID         HmMgrNm
1  (none)      NA  (none) corrp102     Pat Corrales weave801     Earl Weaver
2  (none)      NA  (none) bambg101 George Bamberger larut101    Tony LaRussa
3  (none)      NA  (none) mcnaj801    John McNamara andes101 Sparky Anderson
4  (none)      NA  (none) felsj101      John Felske rosep001       Pete Rose
5  (none)      NA  (none) boros101      Steve Boros lasot101     Tom Lasorda
6  (none)      NA  (none) howsd101      Dick Howser pinil001    Lou Piniella
    WinPID              WinPNm      PID         PNAme  SavePID
1 schrk001          Ken Schrom flanm001 Mike Flanagan camae001
2 higut001       Teddy Higuera seavt001    Tom Seaver cleam001
3 morrj001         Jack Morris stews001 Sammy Stewart hernw001
4 sotom001          Mario Soto carls001 Steve Carlton robir001
5 valef001 Fernando Valenzuela showe001     Eric Show     <NA>
6 guidr001          Ron Guidry blacb001   Buddy Black righd001
              SavePNm GWinRBIID      GWinRBINm VisStPchID    VisStPchNm
1       Ernie Camacho  thora001 Andre Thornton   schrk001    Ken Schrom
2          Mark Clear  rilee001    Ernie Riles   higut001 Teddy Higuera
3 Guillermo Hernandez  gibsk001    Kirk Gibson   hursb001   Bruce Hurst
4        Ron Robinson  diazb001        Bo Diaz   carls001 Steve Carlton
5              (none)  sax-s001      Steve Sax   showe001     Eric Show
6       Dave Righetti  wyneb001  Butch Wynegar   blacb001   Buddy Black
  HmStPchID           HmStPchNm VisBat1ID    VisBat1Nm VisBat1Pos VisBat2ID
1  flanm001       Mike Flanagan  butlb001 Brett Butler          8  cartj001
2  seavt001          Tom Seaver  feldm001  Mike Felder          7  younr001
3  morrj001         Jack Morris  evand002 Dwight Evans          9  boggw001
4  sotom001          Mario Soto  redug001   Gary Redus          7  thomm001
5  valef001 Fernando Valenzuela  robeb002  Bip Roberts          4  gwynt001
6  guidr001          Ron Guidry  smitl002 Lonnie Smith          7  wilsw001
      VisBat2Nm VisBat2Pos VisBat3ID        VisBat3Nm VisBat3Pos VisBat4ID
1    Joe Carter          9  franj002     Julio Franco          6  thora001
2   Robin Yount          8  molip001     Paul Molitor          5  robib001
3    Wade Boggs          5  buckb001     Bill Buckner          3  ricej001
4 Milt Thompson          8  samuj001      Juan Samuel          4  schmm001
5    Tony Gwynn          9  mcrek001 Kevin McReynolds          8  garvs001
6 Willie Wilson          8  bretg001     George Brett          5  mcrah001
          VisBat4Nm VisBat4Pos VisBat5ID        VisBat5Nm VisBat5Pos VisBat6ID
1    Andre Thornton         10  jacob001     Brook Jacoby          5  tablp001
2 Billy Jo Robidoux          3  rilee001      Ernie Riles          6  deerr001
3          Jim Rice          7  bayld001       Don Baylor         10  gedmr001
4      Mike Schmidt          5  wilsg001     Glenn Wilson          9  hayev001
5      Steve Garvey          3  martc001 Carmelo Martinez          7  tempg001
6         Hal McRae         10  whitf001      Frank White          4  balbs001
        VisBat6Nm VisBat6Pos VisBat7ID        VisBat7Nm VisBat7Pos VisBat8ID
1      Pat Tabler          3  bernt001   Tony Bernazard          4  hallm001
2        Rob Deer          9  housp001 Paul Householder         10  gantj001
3     Rich Gedman          2  armat001       Tony Armas          8  barrm001
4       Von Hayes          3  dauld001   Darren Daulton          2  jelts001
5 Garry Templeton          6  kennt001    Terry Kennedy          2  roysj001
6   Steve Balboni          3  motld001    Darryl Motley          9  sundj001
      VisBat8Nm VisBat8Pos VisBat9ID     VisBat9Nm VisBat9Pos HmBat1ID
1      Mel Hall          7  allaa001 Andy Allanson          2 wigga001
2   Jim Gantner          4  ceror001   Rick Cerone          2 cangj001
3 Marty Barrett          4  hoffg001 Glenn Hoffman          6 colld001
4   Steve Jeltz          6  carls001 Steve Carlton          1 davie001
5 Jerry Royster          5  showe001     Eric Show          1 duncm001
6  Jim Sundberg          2  salaa001 Angel Salazar          6 hendr001
          HmBat1Nm HmBat1Pos HmBat2ID        HmBat2Nm HmBat2Pos HmBat3ID
1     Alan Wiggins         4 lacyl001        Lee Lacy         9 ripkc001
2   John Cangelosi         8 tollw001  Wayne Tolleson         5 bainh001
3     Dave Collins        10 whitl001    Lou Whitaker         4 gibsk001
4       Eric Davis         8 bellb001      Buddy Bell         5 parkd001
5   Mariano Duncan         6 landk001   Ken Landreaux         8 madlb001
6 Rickey Henderson         8 randw001 Willie Randolph         4 mattd001
       HmBat3Nm HmBat3Pos HmBat4ID      HmBat4Nm HmBat4Pos HmBat5ID
1    Cal Ripken         6 murre001  Eddie Murray         3 lynnf001
2 Harold Baines         9 fiskc001  Carlton Fisk         7 kittr001
3   Kirk Gibson         9 parrl001 Lance Parrish         2 lemoc001
4   Dave Parker         9 esasn001   Nick Esasky         3 jonet001
5  Bill Madlock         5 brocg001    Greg Brock         3 marsm001
6 Don Mattingly         3 winfd001 Dave Winfield         9 roeng001
       HmBat5Nm HmBat5Pos HmBat6ID        HmBat6Nm HmBat6Pos HmBat7ID
1     Fred Lynn         8 younm002      Mike Young        10 shelj001
2    Ron Kittle        10 walkg001     Greg Walker         3 hulet001
3    Chet Lemon         8 trama001   Alan Trammell         6 hernl001
4   Tracy Jones         7 concd001 Dave Concepcion         6 diazb001
5 Mike Marshall         9 stubf001 Franklin Stubbs         7 sciom001
6 Gary Roenicke        10 cotth001     Henry Cotto         7 berrd001
       HmBat7Nm HmBat7Pos HmBat8ID      HmBat8Nm HmBat8Pos HmBat9ID
1   John Shelby         7 dempr001  Rick Dempsey         2 gutij001
2    Tim Hulett         4 skinj001  Joel Skinner         2 guilo001
3 Larry Herndon         7 evand001 Darrell Evans         3 coled001
4       Bo Diaz         2 oestr001    Ron Oester         4 sotom001
5 Mike Scioscia         2 sax-s001     Steve Sax         4 valef001
6    Dale Berra         5 wyneb001 Butch Wynegar         2 meacb001
             HmBat9Nm HmBat9Pos Additional Acquisition
1    Jackie Gutierrez         5       <NA>           Y
2       Ozzie Guillen         6       <NA>           Y
3       Darnell Coles         5       <NA>           Y
4          Mario Soto         1       <NA>           Y
5 Fernando Valenzuela         1       <NA>           Y
6         Bob Meacham         6       <NA>           Y

Statcast via baseballr

# remotes::install_github("BillPetti/baseballr")
library(baseballr)
statcast_search_pitchers(
  start_date = "2025-08-30", 
  end_date = Sys.Date(), 
  pitcherid = 804636
)
# A tibble: 274 × 118
   pitch_type game_date  release_speed release_pos_x release_pos_z player_name
   <chr>      <date>             <dbl>         <dbl>         <dbl> <chr>      
 1 CH         2025-09-24          84.3         -2.42          5.91 Tong, Jonah
 2 FF         2025-09-24          94.1         -2.43          6.01 Tong, Jonah
 3 FF         2025-09-24          96.4         -2.32          6.1  Tong, Jonah
 4 FF         2025-09-24          94.4         -2.39          6.01 Tong, Jonah
 5 CH         2025-09-24          86           -2.55          5.88 Tong, Jonah
 6 FF         2025-09-24          94.3         -2.42          6.04 Tong, Jonah
 7 FF         2025-09-24          94.7         -2.38          6.05 Tong, Jonah
 8 CH         2025-09-24          85.5         -2.48          5.88 Tong, Jonah
 9 FF         2025-09-24          94.6         -2.49          5.95 Tong, Jonah
10 CU         2025-09-24          75.5         -2.58          5.92 Tong, Jonah
# ℹ 264 more rows
# ℹ 112 more variables: batter <dbl>, pitcher <dbl>, events <chr>,
#   description <chr>, spin_dir <lgl>, spin_rate_deprecated <lgl>,
#   break_angle_deprecated <lgl>, break_length_deprecated <lgl>, zone <dbl>,
#   des <chr>, game_type <chr>, stand <chr>, p_throws <chr>, home_team <chr>,
#   away_team <chr>, type <chr>, hit_location <int>, bb_type <chr>,
#   balls <int>, strikes <int>, game_year <int>, pfx_x <dbl>, pfx_z <dbl>, …