Back to Article
HDSR 2025-143 response to reviewers
Download Notebook

HDSR 2025-143 response to reviewers

Author

Baumer et al

Published

2025-12-17

CC: datasciencereview@harvard.edu, editorinchief_datasciencereview@harvard.edu, “Benjamin S Baumer” bbaumer@smith.edu, “Brianna Heggeseth” bheggese@macalester.edu, “Leslie Myint” lmyint@macalester.edu, “Michael A Posner” michael.posner@villanova.edu, “Paul Roback” roback@stolaf.edu

Ref.: Ms. No. HDSR-2025-143 Emerging Models for a Second Undergraduate Course in Data Science

Dear Nick (and All),

Thanks for sending such a timely proposal, which certainly has been inspiring, to say the least — see below. Looking forward to your full submission!

Cheers to all,

Xiao-Li


Dear Dr. Horton,

Thank you for submitting your proposal, Emerging Models for a Second Undergraduate Course in Data Science, for consideration in the Harvard Data Science Review. After our screening process, I am pleased to invite you to submit your full manuscript for review. When you prepare the full manuscript, please be sure to address the comments below provided by the screening editor if any.

Please note that your manuscript will undergo full review and that this decision to invite your paper does not guarantee acceptance. An advisory editor still has the flexibility to make a decision to reject the paper when considering the full manuscript. Please submit within SIX MONTHS from today.

To submit final files, go to https://www.editorialmanager.com/hdsr/ and log in as an Author. Please do the following:

  1. Change the article type from “Proposal” to “Invited Manuscript”
  2. Supply any additional requested information
  3. If supplying a cover letter and/or response to reviewers, upload as separate files from the manuscript text.

We look forward to reviewing your manuscript. Thank you again for submitting your work to HDSR.

Kindest regards,

Xiao-Li Meng Founding Editor-in-Chief Harvard Data Science Review


Screening comments to author:

I welcome this attempt to set out the details of a course that is perhaps the most common among undergraduate programs labeled data science. I especially look forward to seeing the paper’s analysis and comparison of five such courses. It is not clear from the proposal whether these are courses that exist in various programs or whether they have been designed by the authors for this paper; I hope we can start with the former.

RESPONSE: The 4 courses in this paper are all ones that have been taught at the authors’ institutions before writing. We have clarified this in the Introduction with the sentence: “In this paper, we explicate several recently developed versions of a second course in data science that further develop data science skills.”

The proposal is thoughtful and detailed. However, no short proposal can be detailed enough to leave the reader with no questions. Here are some that I hope the authors will consider.

  1. Is there a common understanding of what is a “data science” curriculum at the undergraduate level? If so, what is it? We must address this question before examining Courses 1, 2, etc in a program, especially since some Statistics departments are simply being renamed Data Science without much change, while some Computer Science departments now offer what they consider to be data science.

RESPONSE: While we do not think there is one common understanding of what an undergraduate data science curriculum is, there have been several notable attempts to articulate such curricula. We cite and discuss several of these attempts. At present, what a full undergraduate data science curriculum looks like still varies across institutions. While it is true that Statistics and Computer Science departments may have different ways of incorporating data science into their respective curricula, our analysis of published, program-level data science learning outcomes does reveal some commonalities. To shed more light on this area, we have added to the Introduction a discussion of the MASDER project which describes the breadth of topic and skill coverage across diverse data science curricula with a tree-based structure that we map to the courses discussed in this paper. ADD MORE DETAILS ABOUT MASDER ONCE THE MASDER SECTION IN THE INTRO IS DONE.

  1. Is there a common understanding of what should be in a first course in data science, the course the authors might call DS1? This is not clear at all; indeed many first courses in data science resemble the one that is being described in the proposal as DS2, especially if they arise in the computer science world. See for example CS 109 at Harvard. I suggest calling it something like an “intermediate” data science course instead of giving it a place in a chronological sequence. Students enter data science at various levels, just as they enter statistics. Some start in a freshman-level class with no prerequisites. Others start in sophomore or junior year after they have acquired some background in math and computing. For example, in statistics they might start with a calculus-based probability class followed by theoretical statistics. In data science it might well be in a class like the described DS2, with some programming and math prerequisites.

RESPONSE: While there is still debate about some aspects of a first course in data science (e.g., whether or not to cover statistical inference), there seems to be a consensus that data wrangling and visualization are core topics in this first course. We discuss this more in the Introduction section “A first course in data science.” Our choice to use the labeling of “DS2” stems from the desire to signal sequencing in some regard. We view the core topics data wrangling and visualization as prerequisite knowledge for the courses that we describe in the paper. The point of alternatively calling “DS2” an “intermediate” course is well taken–in fact, two of the courses presented in the paper are called “Intermediate Data Science”. We address this further in the paper by detailing the prerequisites for each of the respective courses. CS 109 at Harvard is an intermediate data science course, in part because it assumes prior programming experience. However unlike at Harvard, at most of our institutions, there is an explicit DS1 course that does not require prior programming experience.

  1. The proposal identifies an important practical benefit of a standardized data science curriculum, namely transferability across programs. But what are the intellectual and pedagogical benefits of such standardization? Computer science has done very well by starting out with a base of computational thinking, programming, and data structures, and then letting students choose from a large grab bag depending on their interests and goals. Mathematics and statistics programs tend to have longer lists of required courses. Is one approach necessarily better than the other? This is an important question in data science because data scientists are even more diverse in their backgrounds and interests than statisticians and computer scientists.

RESPONSE: We don’t feel that there is a single best approach to shaping a data science curriculum. To that end, we discuss in the “Future work” section of the Discussion section our vision for material sharing at this point in the data science curriculum evolution. As data science programs expand across and within insitutions, we hope that a repository of materials will reduce our collective need to reinvent the wheel and free up time to tailor courses to the specific goals of our programs. Further, transferrability across programs has the benefit of shedding light on core topics and skills which facilitates collective pedagogical innovation across institutions.

  1. Continuing Question 3, what is inherently wrong with “a grab bag of courses from other programs”? Does it not depend on what is being offered in those other programs, and whether the other programs are themselves evolving because of the increased interest in data science? Computer science programs have certainly developed new areas of emphasis related to data science. Some data science programs have a curricular arc or a thread that ties together some of their courses. Others are more of a smorgasbord without explicit “scaffolding and connection” across courses. It is not clear that this diversity is a problem. Perhaps what the authors are advocating is a common foundation and understanding of what constitutes data science. That would be welcome indeed.

RESPONSE: What we are advocating for is data science curriculum design that helps students make connections across courses. “Grab bags” of courses from various programs can fit that need if there is some mechanism that helps students synthesize and connect their learning across courses. The courses presented in this paper all are part of educational sequences that were designed to cultivate students’ data acumen in a scaffolded manner, so we made sure to discuss how these second/intermediate data science courses fit into our overall curricula by discussing how they build on our “DS1” courses.

We have also added discussion in the “Future work” section that discusses how the courses in this paper actually represent a “grab bag of topics” approach, which has the advantage of potentially saving considerable preparation effort if solid materials emerge for core topics and thus allowing instructors more time to tailor second data science courses towards the specific needs of their programs.

  1. What is the place of modeling and statistical inference in the proposed course? Will students examine data beyond wrangling and visualization, important though those are? This gap in the proposal is startling given the list of authors. It is probably just due to the restricted length of the proposal. I hope the full paper will describe the statistical ideas that the authors are assuming to be part of a first course in data science, and how those ideas will be further developed at the next stage.

RESPONSE: At most of our institutions, statistical modeling and inference occurs in courses that are distinct from the DS1 and DS2 courses that we outline in this paper. We discuss these distinctions and content overlaps in Sections 7 and 8 of the paper.

  1. Finally, is data science “matur[ing] as a discipline” or is it perpetually in a state of rapid change both in content and pedagogy? All data science programs have to face this question in this era of AI. In some ways the proposed course might be a moving target. But it must have some fundamental principles and ideas that transcend the tools available at a given moment. I look forward to the authors’ take on this.

RESPONSE: The question of where data science stands in an era increased defined by AI is an important one. Many papers in HDSR have considered this question. We have noted the importance of AI (e.g., section 8.2) in the manuscript as a way to address this important comment.