About this Book

Authors
Affiliations

Bowling Green State University

Smith College

Max Marchi

Cleveland Guardians

About the Authors

Jim Albert is Emeritus Distinguished University Professor, Department of Mathematics and Statistics at Bowling Green State University. Jim is author of Teaching Statistics Using Baseball, Visualizing Baseball, and coauthor (with Jay Bennett) of Curve Ball. Jim received the Significant Contributor to Statistics in Sports in 2003, an award given by the Section of Statistics in Sports of the American Statistical Association.

Max Marchi is a Baseball Analytics Analyst for the Cleveland Guardians. Max was a regular contributor to The Hardball Times and Baseball Prospectus websites and previously consulted for other MLB clubs.

Benjamin S. Baumer is a professor in the Statistical & Data Sciences program at Smith College. He has been a practicing data scientist since 2004, when he became the first full-time statistical analyst for the New York Mets. Ben is a co-author of Modern Data Science with R, The Sabermetric Revolution, and Analyzing Baseball Data with R. Ben has received the Waller Education Award from the ASA Section on Statistics and Data Science Education, the Significant Contributor Award from the ASA Section on Statistics in Sports, and the Contemporary Baseball Analysis Award from the Society for American Baseball Research.

About our Computing Environment

We used R version 4.3.3 (R Core Team 2024) and the following R packages: abdwr3edata v. 0.0.2 (Baumer and Albert 2024), arrow v. 15.0.1 (Richardson et al. 2024), baseballr v. 1.6.0 (Petti and Gilani 2024), bench v. 1.1.3 (Hester and Vaughan 2023), broom v. 1.0.5 (Robinson, Hayes, and Couch 2023), dbplyr v. 2.5.0 (Wickham, Girlich, and Ruiz 2024), downlit v. 0.4.3 (Wickham 2023a), dplyr v. 1.1.4 (Wickham et al. 2023), duckdb v. 0.10.0 (Mühleisen and Raasveldt 2024), fs v. 1.6.3 (Hester, Wickham, and Csárdi 2023), ggplot2 v. 3.5.0 (Wickham 2016), ggrepel v. 0.9.5 (Slowikowski 2024), grateful v. 0.2.4 (Francisco Rodriguez-Sanchez and Connor P. Jackson 2023), here v. 1.0.1 (Müller 2020), Hmisc v. 5.1.2 (Harrell Jr 2024), kableExtra v. 1.4.0 (Zhu 2024), knitr v. 1.45 (Xie 2014, 2015, 2023), Lahman v. 11.0.0 (Friendly et al. 2023), latex2exp v. 0.9.6 (Meschiari 2022), LearnBayes v. 2.15.1 (Albert 2018), lme4 v. 1.1.35.2 (Bates et al. 2015), lobstr v. 1.1.2 (Wickham 2022), lubridate v. 1.9.3 (Grolemund and Wickham 2011), mdsr v. 0.2.7 (Baumer, Kaplan, and Horton 2021), metR v. 0.15.0 (Campitelli 2021), mgcv v. 1.9.1 (Wood 2003, 2004, 2011, 2017; Wood, Pya, and Säfken 2016), mlbplotR v. 1.1.0 (Carl and Kay 2023), modelr v. 0.1.11 (Wickham 2023b), patchwork v. 1.2.0 (Pedersen 2024), readr v. 2.1.5 (Wickham, Hester, and Bryan 2024), remotes v. 2.5.0 (Csárdi et al. 2024), RMariaDB v. 1.3.1 (Müller et al. 2023), rmarkdown v. 2.26 (Xie, Allaire, and Grolemund 2018; Xie, Dervieux, and Riederer 2020; Allaire et al. 2024), RSQLite v. 2.3.5 (Müller et al. 2024), rvest v. 1.0.4 (Wickham 2024), shiny v. 1.8.1 (Chang et al. 2024), skimr v. 2.1.5 (Waring et al. 2022), stringr v. 1.5.1 (Wickham 2023c), tidyverse v. 2.0.0 (Wickham et al. 2019), xml2 v. 1.3.6 (Wickham, Hester, and Ooms 2023), xtable v. 1.8.4 (Dahl et al. 2019), zoo v. 1.8.12 (Zeileis and Grothendieck 2005).

Many of the data graphics in this book use a specific shade of blue used by CRC Press and denoted by the variable crcblue in our R code. To make your graphs match ours, you will need to define crcblue.

crcblue <- "#2905a1"

In this full color version of the book, we also use a pre-defined color-blind-safe diverging color palette.

crc_fc
[1] "#2905a1" "#e41a1c" "#4daf4a" "#984ea3"

Our working directory is set using the here() function from the here package. In that directory are three subdirectories that are referenced in our code: data, data_large, and scripts. The data directory contains small data files that are available on our GitHub repository, as are the R scripts in scripts. However, while the data in data_large is necessary to compile the book, it is too big to host in the GitHub repository. Instructions for creating these data files locally appear in relevant places in the book, most notably 12  Working with Large Data and Appendix A — Retrosheet Files Reference.