# Research

I am a data scientist. Data science is an emerging field commonly described as “the practice of deriving valuable insights from data,” and this thread runs through all of my work. My scholarly contributions have come in five main areas:

- sports analytics:
- Learning about how sports (particularly baseball) work through the analysis of data

- statistics and data science education:
- What, how, and why are we teaching? What, how, and why
*should*we be teaching?

- What, how, and why are we teaching? What, how, and why
- data science:
- Building tools to make data-based research easier and more reproducible

- network science, and analysis of algorithms:
- Theoretical work about properties of networks and graph algorithms

- statistics and data science consulting:
- Aiding
*your*research through statistical modeling and data visualization

- Aiding

Subfields of interest to me include network science, applied statistics, sabermetrics, sports analytics, statistical modeling, analysis of algorithms, combinatorial optimization, data visualization, graph theory, and combinatorics. My Erdös number is 3, as I have co-authored a paper with Amotz Bar-Noy, who has co-authored a paper with Noga Alon, who has co-authored a paper with Paul Erdős.

My background is academically diverse, in that my undergraduate degree is in economics (my first declared major was English), my doctorate is in mathematics, my thesis adviser is in computer science, and my professional experience is in statistics. As such, my research tends to be interdisciplinary, with an emphasis on applying available techniques from any discipline to address the question of interest.

In 2012, I completed my Ph.D. in Mathematics at the Graduate Center of the City University of New York, where my advisor was Amotz Bar-Noy, also of Brooklyn College. Previously, I earned an M.A. in Applied Mathematics from the University of California, San Diego, and a B.A. in Economics from Wesleyan University.

In 2019, I won the Significant Contributor Award from the Section on Statistics in Sports of the American Statistical Association.

Download my CV Please see my C.V. for complete details on my work.

## Books

*Analyzing Baseball Data with R*, 3nd edition

*Analyzing Baseball Data with R, 3rd Edition* introduces R to sabermetricians, baseball enthusiasts, and students interested in exploring the richness of baseball data. It equips you with the necessary skills and software tools to perform all the analysis steps, from importing the data to transforming them into an appropriate format to visualizing the data via graphs to performing a statistical analysis.

Read the 3rd edition

Buy the book from:

*Modern Data Science with R*, 2nd edition

Contemporary data science uses both statistical modeling and computer programming to extract meaning from data. It requires a tight integration of knowledge from statistics, computer science, mathematics, and a domain of application. This book, which is intended for readers with some background in statistics and modest prior experience with coding, helps them develop and practice the appropriate skills to tackle complex data science projects. Most of the examples are done in R, but SQL, Python, and other cutting-edge tools are discussed as well.

Read the:

- 2nd edition
- lightly revised 3rd edition

Buy the book from:

*The Sabermetric Revolution*

*The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball*, is co-authored with leading sports economist Andrew Zimbalist. We examine the evolution of sabermetrics in baseball and other sports since the publication of *Moneyball*, summarize the current state of sabermetric thinking, and address the question of whether there is any evidence that sabermetrics has actually worked. The book was published by the University of Pennsylvania Press in 2014.

Buy the book from:

## Recent Projects

### DSC-WAV

I was the PI on a nine institution, $1.2 million workforce developed project funded by the National Science Foundation.

The DSC-WAV project simultaneously addressed two problems: 1) the inability of community-based and non-profit organizations to tackle data science problems; and 2) the lack of real world experience gained by students studying data science.

This project addressed both issues by deploying teams of data science students to assist local organizations, thereby increasing the long-term capacity of the data science workforce.

### OpenIntro

I developed a series of courses on Introductory Statistics with R sequence of courses for DataCamp, an interactive platform to learn R and data science. Mine Çetinkaya-Rundel (Duke), Andrew Bray (Reed), and Jo Hardin (Pomona) are working with me on these courses. We are horrified by the recent sexual harassment scandal at DataCamp and the ensuing coverup.

Much of that content is now available through interactive tutorials developed with the `learnr`

package supporting the textbook *OpenIntro::Introduction to Modern Statistics Tutorials*.

### ETL packages for R

`etl`

is an R package to facilitate Extract - Transform - Load (ETL) operations for **medium data**. The end result is generally a populated SQL database, but the user interaction takes place solely within R.

## Publication List

Search for me on:

## References

*Journal of Quantitative Analysis in Sports*, vol. 20, no. 1, pp. 1–3, 2024.

*Harvard Data Science Review*, vol. 5, no. 1, 2023.

*Journal of Open Source Software*, vol. 6, no. 65, p. 3661, Sep. 2021.

*Foundations of Data Science*, vol. 5, no. 2, pp. 244–265, 2023.

*Harvard Data Science Review*, vol. 3, no. 1, pp. 1–8, Feb. 2021 [Online]. Available: https://hdsr.mitpress.mit.edu/pub/nvflcexe

*Journal of Computational and Graphical Statistics*, vol. 26, no. 4, pp. 781–783, 2017.

*Journal of Statistics and Data Science Education*, vol. 30, no. 1, pp. 15–28, 2022.

*Analyzing baseball data with R*, 2nd ed. Boca Raton, FL: CRC Press, 2018, p. 342 [Online]. Available: https://www.crcpress.com/Analyzing-Baseball-Data-with-R-Second-Edition/Marchi-Albert-Baumer/p/book/9780815353515

*Analyzing baseball data with R*, 3rd ed. Boca Raton, FL: CRC Press, 2024, p. 430 [Online]. Available: https://www.routledge.com/Analyzing-Baseball-Data-with-R/Albert-Baumer-Marchi/p/book/9781032668093

*International Journal of Financial Studies*, vol. 7, no. 2, p. 19, 2019.

*PLOS Neglected Tropical Diseases*, vol. 12, no. 1, pp. 1–17, Jan. 2018.

*Developmental Biology*, vol. 460, no. 2, pp. 115–138, Apr. 2020.

*Annual Review of Statistics and Its Application*, vol. 4, no. 1, pp. 1–16, 2017.

*Stat*, vol. 10, no. 1, p. e332, Dec. 2020.

*The American Statistican*, vol. 72, no. 1, pp. 66–71, 2018.

*PeerJ Preprints*, vol. 5, p. e3160v2, Aug. 2017.

*Wiley Interdisciplinary Reviews: Computational Statistics*, vol. 15, no. 6, p. e1612, 2023.

*Annals of Applied Statistics*, vol. 12, no. 4, pp. 2483–2516, 2018.

*Technology Innovations in Statistics Education*, vol. 14, no. 1, 2022.

*Journal of Statistics Education*, vol. 28, no. 1, Mar. 2020.

*Journal of Computational and Statistical Graphics*, vol. 28, no. 2, pp. 256–264, 2019.

*AMSTAT News*, no. 444, p. 1, Sep. 2022 [Online]. Available: https://magazine.amstat.org/blog/2022/09/01/syposium_mass/

*AMSTAT News*, no. 444, pp. 17–19, 2014 [Online]. Available: http://magazine.amstat.org/blog/2014/06/01/datafest/

*Math Horizons*, vol. 22, no. 1, pp. 18–20, 2014.

*The great analytics rankings*, R. Webb, Ed. ESPN.com; ESPN.com, 2015 [Online]. Available: http://espn.go.com/espn/feature/story/_/id/12331388/the-great-analytics-rankings#!mlb

*The American Statistician*, vol. 72, no. 3. Taylor & Francis, pp. 297–298, 2018.

*International Statistical Review*, vol. 82, no. 2. Wiley Online Library, pp. 313–315, Aug-2014.

*JSM proceedings*, 2014.

*Wiley Interdisciplinary Reviews: Computational Statistics*, vol. 7, no. 3, pp. 167–177, 2015.

*CHANCE*, vol. 28, no. 3, pp. 40–50, 2015 [Online]. Available: http://chance.amstat.org/2015/04/setting-the-stage/

*CHANCE*, vol. 27, no. 3, pp. 41–44, 2014 [Online]. Available: http://chance.amstat.org/2014/09/avoiding-war/

*Handbook of statistical methods and analyses in sports*, J. Albert, M. E. Glickman, T. B. Swartz, and R. H. Koning, Eds. Boca Raton, FL: Chapman; Hall/CRC Press, 2016, pp. 1–37 [Online]. Available: https://www.crcpress.com/Handbook-of-Statistical-Methods-and-Analyses-in-Sports/Albert-Glickman-Swartz-Koning/p/book/9781498737364

*Discussiones Mathematicae Graph Theory*, vol. 36, no. 3, pp. 577–602, 2016.

*Modern Data Science with R*. Boca Raton, FL: Chapman; Hall/CRC Press, 2017, p. 551 [Online]. Available: https://www.crcpress.com/Modern-Data-Science-with-R/Baumer-Kaplan-Horton/9781498724487

*Modern Data Science with R*, 2nd ed. Boca Raton, FL: Chapman; Hall/CRC Press, 2021, pp. 1–673 [Online]. Available: https://www.routledge.com/Modern-Data-Science-with-R/Baumer-Kaplan-Horton/p/book/9780367191498

*The American Statistician*, vol. 69, no. 4, pp. 343–353, 2015.

*The American Statistician*, vol. 69, no. 4, pp. 334–342, 2015.

*Technology Innovations in Statistics Education*, vol. 8, no. 1, 2014.

*Algorithmica*, vol. 72, no. 1, pp. 148–166, 2015.

*Eastern Economic Journal*, vol. 40, pp. 488–498, Dec. 2014.

*Theoretical Computer Science*, vol. 610, pp. 135–148, 2016.

*Journal of Quantitative Analysis in Sports*, vol. 11, no. 2, pp. 69–84, 2015.

*The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball*. University of Pennsylvania Press, 2014, p. 240 [Online]. Available: http://www.upenn.edu/pennpress/book/15168.html

*Journal of Quantitative Analysis in Sports*, vol. 8, no. 2, pp. 1–17, 2012.

*ALGOSENSORS*, 2011, vol. 7111, pp. 28–41.

*MSWiM*, 2011, pp. 341–350.

*Algorithmica*, vol. 76, no. 2, pp. 1–19, Oct. 2016.

*JSM proceedings*, 2010.

*JSM proceedings*, 2010.

*JSM proceedings*, 2009.

*Journal of Quantitative Analysis in Sports*, vol. 5, no. 2, pp. 1–16, 2009.

*Journal of Quantitative Analysis in Sports*, vol. 4, no. 2, pp. 1–11, 2008.