Ben Baumer

Students hours:

  • Tuesdays and Thursdays from 9:30 am – 10:30 am ET

  • by appointment

  • all office hours via Zoom


Osman Keshawarz, Data Research and Statistics Counselor

Drop-in hours:

  • Tuesdays and Thursdays from 6:00 pm – 8:20 pm ET

  • by appointment

  • all office hours via Zoom


An introduction to data science using Python, R, and SQL. Students will learn how to scrape, process, and clean data from the web; manipulate data in a variety of formats; contextualize variation in data; construct point and interval estimates using resampling techniques; visualize multidimensional data; design accurate, clear, and appropriate data graphics; create data maps and perform basic spatial analysis; and query large relational databases. No prerequisites, but a willingness to write code is necessary.


Suggested as supplementary references:


Classes meet Monday–Friday from 1:40 pm – 2:55 pm ET via Zoom. All classes will be recorded for asynchronous viewing. Attendance will not be taken. It is your decision to come to class or not. If you do come to class, please be prepared to participate fully. Class time will mostly consist of discussions, Q&A sessions, live coding demonstrations, and group work.

Evening sessions will take place Tuesdays and Thursdays from 7:05 – 8:20 pm ET. These sessions have no agenda, and are reserved for collaborative studying, group project work, and other support mechanisms. Evening sessions will be led by Osman Keshawarz, the Data Research and Statistics Counselor at the Spinelli Center.

This is a 4-credit course, meaning that by federal guidelines, it should consume about 28 hours per week of your time. We meet for 7 hours per week. That means you should be spending about 21 hours per week, or 3 hours per day, on this course outside of class.


Smith is committed to providing support services and reasonable accommodations to all students with disabilities. To request an accommodation, please register with the Disability Services Office at the beginning of the semester. To do so, call (413) 585-2071 to arrange an appointment with Laura Rauscher, Director of Disability Services.



I am committed to fostering a classroom environment where all students thrive. I am committed to affirming the identities, realities and voices of all students, especially those from historically marginalized or underrepresented backgrounds. I am dedicated to creating a space where everyone in the class is respected, is free from discrimination based on race, ethnicity, sexual orientation, religion, gender identity, disability status, and other identities, and feel welcome and ready to learn at your highest potential.

If you have any concerns or suggestions for how to make this class more inclusive, please reach out to me. I am here to support your learning and growth as data scientists and people!


You choose whether you want to attend class. If you choose to attend, I expect your full attention. There is no penalty for not attending class.


Much of this course will operate on a collaborative basis, and you are expected and encouraged to work together with a partner or in small groups to study, complete homework assignments, and prepare for exams. However, all work that you submit for credit must be your own. Copying and pasting sentences, paragraphs, or blocks of code from another student or from online sources is not acceptable and will receive no credit. No interaction with anyone but the instructors is allowed on any exams or quizzes. All students, staff and faculty are bound by the Smith College Honor Code, which Smith has had since 1944.

Academic Honor Code Statement

Smith College expects all students to be honest and committed to the principles of academic and intellectual integrity in their preparation and submission of course work and examinations. Students and faculty at Smith are part of an academic community defined by its commitment to scholarship, which depends on scrupulous and attentive acknowledgement of all sources of information, and honest and respectful use of college resources.

Cases of dishonesty, plagiarism, etc., will be reported to the Academic Honor Board.

Code of Conduct

As the instructor and assistants for this course, we are committed to making participation in this course a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. Examples of unacceptable behavior by participants in this course include the use of sexual language or imagery, derogatory comments or personal attacks, deliberate misgendering or use of “dead” names, trolling, public or private harassment, insults, or other unprofessional conduct.

As the instructor and assistants we have the right and responsibility to point out and stop behavior that is not aligned to this Code of Conduct. Participants who do not follow the Code of Conduct may be reprimanded for such behavior. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the instructor.

All students, the instructor, the lab instructor, and all assistants are expected to adhere to this Code of Conduct in all settings for this course: lectures, labs, office hours, tutoring hours, and over Slack.

This Code of Conduct is adapted from the Contributor Covenant, version 1.0.0, available here.


This course will employ standards-based assessment. There are 13 standards. Each standard will be assessed in both a group and individual setting. In each setting, each standard that is met earns a student one point. Each standard that is mastered earns a student two points. Points are redeemed for grades as follows:

grade range comment
A+ 50–53 let me show you these PhD programs… 🎉
A 46–49 you’re applying for data science jobs, right?
A- 44–45 have you thought about being a Data Assistant?
B+ 42–43 we should talk about an SDS major
B 40–41 well-prepared for the next challenge
B- 36–39
C 26–35 sufficient progress to continue in SDS
D 20–25
E 0–19

Group points are earned by meeting standards on mini-projects, for which you will work primarily in pairs. Individual points are earned by meeting standards on quizzes, which will be administered via Moodle. One bonus point is available for exceptional active engagement with the class. Thus, there are 53 total points possible (13 standards \(\times\) 2 settings \(\times\) 2 possible points + 1 possible engagement point). Please see Assignments below for more information.

Please read some guidelines for group work.



These assignments are mandatory and graded.

  1. Mini-Projects: You will work on five group projects over the course of the semester. These projects will be structured, but fairly open-ended to allow you be creative. Mini-projects are the mechanism through which you meet group standards, and are in many ways the most important part of the course.

  2. Quizzes: Quizzes are the mechanism through which you meet individual standards. Each standard will have one quiz administered via Moodle. Quizzes will consist involve computational assignments in R and/or SQL, possibly with written explanations. A student may attempt a quiz up to three times.

Other ways to earn points

These assignments may or may not be collected or affect your grade.

  1. Engagement: Students can earn one bonus point through active participation in class, answering questions on or GitHub, or exceptional engagement with group work. This is not a given—only about one-quarter of students will earn this point.

Uncollected assignments

These assignments are not collected and will not affect your grade. However, I still expect that you will do them! These assignments are not optional and are major component of your learning in this course.

  1. Screencasts: Each topic will have an associated screencast. You should watch the screencasts.

  2. Readings: Each topic will be associated with a different section of the book. You should read those sections of the book.

  3. Labs: Each topic will be accompanied by a lab assignment. You should attempt these labs assigments. Students are strongly encouraged to work in pairs during these labs.


Extensions up to 24 hours will typically be granted when requested at least 24 hours in advance. Longer extensions, or those requested within 24 hours of a deadline will typically not be granted. Please plan accordingly. Please note that because many of the assignments in this class are collaborative, individual extensions for group assignments will be problematic. All extended deadlines will appear on Moodle.


You will complete all of your homework assignments in R Markdown. Unless otherwise noted, you should expect to submit your assignment by uploading an R Markdown file (with a .Rmd file extension) to Moodle. You should use the template from the sds192 package for all assignments.


Moodle and course website

The course website and Moodle will be updated regularly with lecture handouts, project information, assignments, and other course resources. Homework and grades will be submitted to Moodle. Please check both regularly.


The use of the R statistical computing environment with the RStudio interface is thoroughly integrated into the course. Both R and RStudio are free and open-source, and are installed on most computer labs on campus. Please see the Resources page for help with R. If you have a Chromebook, you should be able to complete the assignments using the RStudio Server. Please see me if you don’t already have an account.

Unless otherwise noted, you should assume that it will be helpful to bring a laptop to class. If you do not have a laptop, there are loaner laptops available – please contact me if you need one.


  • GitHub Organization
  • Slack is the primary mechanism for course-related discussions of all kinds. Please do not email me with course-related questions! Instead, post these on #questions on Slack. If discretion is absolutely necessary, private message me on Slack.


Your ability to communicate results—which may be technical in nature—to your audience—which is likely to be non-technical—is critical to your success as a data analyst. The assignments in this class will place an emphasis on the clarity of your writing.

The Spinelli Center

The Spinelli Center (now in Seelye 207) supports students doing quantitative work across the curriculum. In particular, they employ:

  • Data assistants who will visit our class regularly. These are students who have taken this class before and are trained to help you with this material. Don’t be shy about flagging them down!
  • Statistics TAs available from 7:00–9:00pm on Sunday–Thursday evenings in Burton 301. These students are trained to help you with your statistics questions, but may or may not be able to help you with your R questions.
  • A Data Research and Statistics Counselor (Osman Keshawarz) who keeps both drop-in hours and appointments. Students are welcome to email to make an appointment with either the Data Counselor or one of the Data Assistants.

Your fellow students are also an excellent source for explanations, tips, etc.

Tentative Schedule

The following outline gives a basic description of the course. Please see the detailed schedule for more specific information about readings and assignments.

Week Topic Reading Assignments
1 data visualization MDSR, Ch. 1–2
2 grammar of graphics MDSR, Ch. 3, 8 mini-project
3 grammar of data wrangling MDSR, Ch. 4–5 mini-project
4 iteration MDSR, Ch. 6–7 mini-project
5 spatial data MDSR, Ch. 17–18 mini-project
6 database querying MDSR, Ch. 15–16 mini-project