Syllabus

About the Course

Instructor

Important

Student hours:

  • Tuesdays and Wednesdays from 2:45 pm – 3:30 pm ET in McConnell 213
  • Fridays from 10:00 am – 11:30 am ET in McConnell 213
  • by appointment:
    • either in McConnell 213
    • or via Zoom

Description

In sports analytics, we apply methods from the statistical and data sciences to sports to address fundamental questions of interest to players, coaches, team executives, journalists, and fans alike. Simple questions (e.g., who are the best players?) are complicated by the interdependent nature of team sports, the omnipresence of randomness (i.e., luck), and frequent changes to personnel, rules, equipment, league alignments, and other structures. However, in many ways sports provides an ideal laboratory for applied statistical analysis, as many sports generate copious amounts of data under regularized conditions. In this course, students will explore the big ideas in sports analytics (e.g., expected points, win probabilities, team strengths, etc.) and how they manifest across a variety of different sports. They will develop a working knowledge of the most prominent statistical models for sports analytics and apply them to a variety of public sources of sports data. The course will conclude with a research project on a topic and sport of the student’s choosing.

Prerequisites

SDS 192 and SDS 210 (or the equivalent)

Learning goals

  • Demonstrate technical understanding on the major concepts in sports analytics
  • Use statistical software to access and analyze public data about sports
  • Translate analytical frameworks across different sports, where applicable
  • Conduct in-depth quantitative research about sports and contextualize the results

Textbooks

Required

Analyzing Baseball Data with R, 3rd edition, by Albert, Baumer, and Marchi. Free to read online

Supplemental resources:

  • Posit Connect is an interactive web application framework for R. Available for free via our Posit Connect Server.

Classes

Classes meets on Mondays and Wednesdays from 9:25-10:40 am ET in Burton 307.

Content

Expectations

This is a 4 credit course, meaning that by federal guidelines, it should consume about 12 hours per week of your time. We meet for 2.5 hours per week. That means you should be spending about 9.5 hours per week, or nearly 2 hours per day, on this course outside of class.

Warning

You should be spending about 9.5 hours per week on this course outside of class.

Grading & Assignments

The learning and assessments in this course are centered around three main components:

  • (10%) Engagement: Attending class meetings, being active on the course Slack channel, participating in class discussions, and offering ideas and questions in class or on Slack are all ways of demonstrating engagement. Your engagement grade will be determined by me in consultation with periodic self-assessments completed by you.
  • (15%) Homework: These assignments will include problems that allow you to work with data and analyze the results
  • (20%) Midterm exam: a closed book, self-scheduled written exam
  • (20%) Colloquium presentation: You will choose a published research paper in sports analytics, and during the second half of the semester, you and a partner will present the paper to the class.
  • (35%) Research project: a 10-page Quarto Manuscript and a 3-minute class presentation.

Please see the schedule for the most current due dates.

Extensions

I value your ability to meet deadlines and manage your own workload. I am also a reasonable person who understands that life happens and this is not always possible. Extensions up to 48 hours will typically be granted when requested at least 48 hours in advance without requiring a reason or an explanation. Longer extensions, or those requested within 48 hours of a deadline, will typically not be granted. Please plan accordingly. Please note that because many of the assignments in this class are collaborative, individual extensions for group assignments will be problematic. All extended deadlines will appear on Moodle.

Academic Integrity

I expect that you will maintain your academic integrity in this class. Please read Smith’s Academic Integrity Statement. Please pay particular attention to this sentence in the “Examples of Violations” section:

If you are finding ways to avoid the “thinking” component of your coursework, you should stop to ask yourself whether you are compromising your academic integrity.

Policies

Inclusion

I am committed to fostering a classroom environment where all students thrive. I am committed to affirming the identities, realities and voices of all students, especially those from historically marginalized or underrepresented backgrounds. I am dedicated to creating a space where everyone in the class is respected, is free from discrimination based on race, ethnicity, sexual orientation, religion, gender identity, disability status, and other identities, and feel welcome and ready to learn at your highest potential.

If you have any concerns or suggestions for how to make this class more inclusive, please reach out to me. I am here to support your learning and growth as data scientists and people!

Accommodations

Smith is committed to providing support services and reasonable accommodations to all students with disabilities. To request an accommodation, please register with the Accessibility Resource Center (ARC) at the beginning of the semester. To contact ARC, please email .

Attendance

You choose whether you will attend class. If you choose to attend, I expect your full attention. If you choose not to attend, you accept responsibility for any lost educational value.

Important

We hope it goes without saying that during class, you should not use your computer or cell phone for personal email, web browsing, social media, or any activity that’s not related to the class.

Collaboration

Much of this course will operate on a collaborative basis, and you are expected and encouraged to work together with a partner or in small groups to study, complete assignments, and work on projects. However, all work that you submit for credit must be your own.

Important

Copying and pasting sentences, paragraphs, or blocks of code from another student or from online sources is not acceptable and will receive no credit.

No interaction with anyone but the instructors is allowed on any exams or quizzes. All students, staff and faculty are bound by the Smith College Honor Code, which Smith has had since 1944.

Generative AI

Generative AI and Academic Integrity

Please read the Smith Academic Integrity Board’s statement on Generative Artificial Intelligence & Your Academic Integrity

I draw your attention to the following excerpt:

Tip

Any time you are using AI in a way that is substituting for the “thinking work” that you should be doing for a course, you should stop.

My perspective on AI and your learning

My goal is to help you achieve the learning goals for this course using only the mental model you have built of the material we have covered, and without the aid of generative AI. While I accept the ubiquity of generative AI, I believe that helping you build your mental model of this material is where I can best contribute to your education. To that end, much of our time in class and many of our assessments will take place in AI-free environments. Other learning will take place outside of class, wherein you are free to use AI in whatever fashion you want (unless otherwise noted), provided that it is in compliance with Smith’s Academic Integrity policies.

Please understand that while AI may be helpful to you in building your mental model of the material, it will not be available to you during many of our assessments, and I am comparatively less interested in your ability to complete tasks while using AI than I am in your ability to demonstrate knowledge using only your brain (and body).

This perspective applies to all course content, including mathematical equations and R code.

Usage

Important
  • Use of generative AI is expressly prohibited on exams and oral presentations.
  • Use of generative AI is generally prohibited inside the classroom, although there may be exceptions.
  • Unless otherwise noted (such as the above), generative AI can be used whenever you are outside of class.

Please read my perspective on learning and generative AI. Please understand that careless and excessive use of generative AI will likely impede your ability to achieve the course learning goals.

Warning

Remember that generative AI is not intelligent, doesn’t think, and has no idea what is true or false. You are solely responsible for the veracity of anything (e.g., code or text) you submit.

Code of Conduct

As the instructor and assistants for this course, we are committed to making participation in this course a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. Examples of unacceptable behavior by participants in this course include the use of sexual language or imagery, derogatory comments or personal attacks, deliberate misgendering or use of “dead” names, trolling, public or private harassment, insults, or other unprofessional conduct.

As the instructor and assistants we have the right and responsibility to point out and stop behavior that is not aligned to this Code of Conduct. Participants who do not follow the Code of Conduct may be reprimanded for such behavior. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the instructor.

Important

All students, the instructor, the lab instructor, and all assistants are expected to adhere to this Code of Conduct in all settings for this course: lectures, labs, office hours, tutoring hours, and over Slack.

This Code of Conduct is adapted from the Contributor Covenant, version 1.0.0, available here.

Resources

Moodle and course website

The course website and Moodle will be updated regularly with lecture handouts, project information, assignments, and other course resources. Homework and projected grades will be submitted to Moodle. Please check both regularly.

Computing

The use of the R statistical computing environment with the RStudio interface is thoroughly integrated into the course. You have two options for using RStudio:

  • The server version of Posit Workbench on the web. The advantage of using the server version is that all of your work will be stored in the cloud, where it is automatically saved and backed up. This means that you can access your work from any computer with a web browser (Firefox is recommended) and an Internet connection.
  • A desktop version of RStudio IDE installed on your machine. The downside to this approach is that your work is only stored locally, and you will have to manage your own installation.

Note that you do not have to choose one or the other – you may use both. However, it is important that you understand the distinction so that you can keep track of your work. Both R and RStudio are free and open-source, and are installed on most computer labs on campus.

Unless otherwise noted, you should assume that it will be helpful to bring a laptop to class. It is not required, but since there are no workstations in the classroom, we will need a critical mass (i.e. at least 12) computers in the classroom pretty much everyday.

Communication

  • Slack is the primary forum for course-related discussions of all kinds. Please do not email me with course-related questions! Instead, post those #questions on Slack. If discretion is absolutely necessary, private message me on Slack.

Writing

Your ability to communicate results—which may be technical in nature—to your audience – which is likely to be non-technical—is critical to your success as a data scientist. The assignments in this class will place an emphasis on the clarity of your writing.

This course is part of Smith College’s Writing Enriched Curriculum. As such, the course supports the Writing Plan of the Program in Statistical & Data Sciences.

Please read the SDS Writing Plan for more information.

Tentative Schedule

Please see the detailed schedule for more specific information about readings and assignments.

Reading list

Albert, J. 2015. “Player Evaluation Using Win Probabilities in Sports Competitions.” Wiley Interdisciplinary Reviews: Computational Statistics 7 (5): 316–25. https://doi.org/10.1002/wics.1358.
Albert, Jim, Max Marchi, and Benjamin S. Baumer. 2024. Analyzing Baseball Data with R. 3rd ed. Boca Raton, FL: CRC Press. https://beanumber.github.io/abdwr3e/.
Baumer, Benjamin S, Gregory J Matthews, and Quang Nguyen. 2023. “Big Ideas in Sports Analytics and Statistical Tools for Their Investigation.” Wiley Interdisciplinary Reviews: Computational Statistics 15 (6): e1612. https://doi.org/10.1002/wics.1612.
Bradley, Ralph Allan, and Milton E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.” Biometrika 39 (3/4): 324–45. https://doi.org/10.2307/2334029.
Cervone, Daniel, Alex D’Amour, Luke Bornn, and Kirk Goldsberry. 2016. “A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes.” Journal of the American Statistical Association 111 (514): 585–99. https://doi.org/10.1080/01621459.2016.1141685.
Deshpande, Sameer K., and Shane T. Jensen. 2016. “Estimating an NBA Player’s Impact on His Team’s Chances of Winning.” Journal of Quantitative Analysis in Sports 12 (2). https://doi.org/10.1515/jqas-2015-0027.
Elo, A. E. 1978. The Rating of Chessplayers, Past and Present. New York: Arco Publishing.
Kovalchik, Stephanie A. 2016. “Searching for the GOAT of Tennis Win Prediction.” Journal of Quantitative Analysis in Sports 12 (3): 127–38. https://doi.org/10.1515/jqas-2015-0059.
Lindsey, George R. 1963. “An Investigation of Strategies in Baseball.” Operations Research 11 (4): 477–501. https://doi.org/10.1287/opre.11.4.477.
Lopez, Michael J. 2020. “Bigger Data, Better Questions, and a Return to Fourth down Behavior: An Introduction to a Special Issue on Tracking Data in the National Football League.” Journal of Quantitative Analysis in Sports 16 (2): 73–79. https://doi.org/10.1515/jqas-2020-0057.
Lopez, Michael J, Gregory J Matthews, and Benjamin S Baumer. 2018. “How Often Does the Best Team Win? A Unified Approach to Understanding Randomness in North American Sport.” The Annals of Applied Statistics 12 (4): 2483–2516. https://doi.org/10.1214/18-aoas1165.
Romer, D. 2006. “Do Firms Maximize? Evidence from Professional Football.” Journal of Political Economy 114 (2): 340–65. https://doi.org/10.1086/501171.
Yam, Derrick R., and Michael J. Lopez. 2019. “What Was Lost? A Causal Estimate of Fourth down Behavior in the National Football League.” Journal of Sports Analytics 5 (3): 153–67. https://doi.org/10.3233/jsa-190294.
Yurko, Ronald, Samuel Ventura, and Maksim Horowitz. 2019. nflWAR: A Reproducible Method for Offensive Player Evaluation in Football.” Journal of Quantitative Analysis in Sports 15 (3): 163–83. https://doi.org/10.1515/jqas-2018-0010.