Syllabus

About the Course

Instructor

  • Ben Baumer (bbaumer@smith.edu, McConnell 214, 413-585-3440).

Student hours:

  • Tuesdays from 1:45 pm – 3:15 pm ET in McConnell 213
  • Fridays from 9:45 am – 11:15 am ET in McConnell 213
  • by appointment (either in McConnell 213 or via Zoom)

Description

Advanced programming techniques for data science using R. This course is not about data analysis—rather, students will learn the R programming language at a deep level. Topics may include data structures, control flow, regular expressions, functions, environments, functional programming, object-oriented programming, debuggging, testing, version control, documentation, literate programming, code review, and package development. The major goal for the course is to contribute to a viable, collaborative, open-source, publishable R package.

Note

This course satisfies the programming depth requirement for the SDS major.

Prerequisites

SDS 192 and CSC 111

Learning goals

  • Contribute to an open-source software project using version control
  • Write a robust, encapsulated software package in the R programming language
  • Write and debug sophisticated functions in R

See also Reinhart and Genovese (2020).

Textbooks

Required

Advanced R, 2nd edition, Hadley Wickham, CRC Press, 2014. Available free online.

R Packages, 2nd edition, Hadley Wickham and Jenny Bryan, O’Reilly, 2023. Available free online.

Suggested as supplementary references

Classes

Classes meet Tuesdays and Thursdays from 10:50–12:05 am ET in Seelye 301.

Accommodations

Smith is committed to providing support services and reasonable accommodations to all students with disabilities. To request an accommodation, please register with the Accessibility Resource Center (ARC) at the beginning of the semester. To contact ARC, please email .

Policies

Inclusion

I am committed to fostering a classroom environment where all students thrive. I am committed to affirming the identities, realities and voices of all students, especially those from historically marginalized or underrepresented backgrounds. I am dedicated to creating a space where everyone in the class is respected, is free from discrimination based on race, ethnicity, sexual orientation, religion, gender identity, disability status, and other identities, and feel welcome and ready to learn at your highest potential.

If you have any concerns or suggestions for how to make this class more inclusive, please reach out to me. I am here to support your learning and growth as data scientists and people!

Attendance

You choose whether you will attend class. If you choose to attend, I expect your full attention. If you choose not to attend, you accept responsibility for any lost educational value.

Important

We hope it goes without saying that during class, you should not use your computer or cell phone for personal email, web browsing, Facebook, or any activity that’s not related to the class.

Collaboration

Much of this course will operate on a collaborative basis, and you are expected and encouraged to work together with a partner or in small groups to study, complete homework assignments, and prepare for exams. However, all work that you submit for credit must be your own.

Important

Copying and pasting sentences, paragraphs, or blocks of code from another student or from online sources is not acceptable and will receive no credit.

No interaction with anyone but the instructors is allowed on any exams or quizzes. All students, staff and faculty are bound by the Smith College Honor Code, which Smith has had since 1944.

Use of Generative AI

This course has a flexible policy towards the use of generative AI tools.

  • Use:
    • In writing code: The use of generative AI is permitted in this course as long as you properly cite the AI-generated content and use it responsibly. Specific assignments may have more restrictive use policies.
    • In writing text: The use of generative AI tools is limited to pre-writing activities (e.g., brainstorming, gathering information, organizing an outline, etc.). AI tools are specifically not permitted for writing entire sentences, paragraphs, drafts, or papers to complete class assignments. (Please see the policy on Abuse below.)
  • Abuse: Attempts to pass of AI-generated content as your own (including but not limited to failure to properly cite generative AI tools) is considered plagiarism and could be a violation of Smith’s Academic Honor Code.
  • Disclosure: If you choose to use generative AI as a learning aid, it is essential to disclose its use on your assignments to maintain academic integrity. If you use generative AI, make sure to add a “Generative AI Disclosure:” callout block at the bottom of your assignment (see below). Your disclosure should state what program you used and how you used it, including links to the specific prompts you used, if possible. Properly citing the AI-generated content allows me to understand your process better and gives credit to the assistance received from these tools.
Sample Generative AI Disclosure Statement

Generative AI Disclosure: This assignment was supported by use of the AI platform ChatGPT. Specifically, I used GPT 3.5 to assist in the title creation (link here), although the final title was modified slightly. I also used ChatGPT to help me plan my outline (link here). I implemented the chatbot’s recommendations.

Warning

Remember that generative AI is not intelligent, doesn’t think, and has no idea what is true or false. You are solely responsible for the veracity of anything (e.g., code or text) you submit.

Academic Honor Code Statement

Smith College expects all students to be honest and committed to the principles of academic and intellectual integrity in their preparation and submission of course work and examinations.

Students and faculty at Smith are part of an academic community defined by its commitment to scholarship, which depends on scrupulous and attentive acknowledgement of all sources of information, and honest and respectful use of college resources.

Cases of dishonesty, plagiarism, etc., will be reported to the Academic Honor Board.

Code of Conduct

As the instructor and assistants for this course, we are committed to making participation in this course a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. Examples of unacceptable behavior by participants in this course include the use of sexual language or imagery, derogatory comments or personal attacks, deliberate misgendering or use of “dead” names, trolling, public or private harassment, insults, or other unprofessional conduct.

As the instructor and assistants we have the right and responsibility to point out and stop behavior that is not aligned to this Code of Conduct. Participants who do not follow the Code of Conduct may be reprimanded for such behavior. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the instructor.

Important

All students, the instructor, the lab instructor, and all assistants are expected to adhere to this Code of Conduct in all settings for this course: lectures, labs, office hours, tutoring hours, and over Slack.

This Code of Conduct is adapted from the Contributor Covenant, version 1.0.0, available here.

Content

Expectations

This is a 4 credit course, meaning that by federal guidelines, it should consume about 12 hours per week of your time. We meet for 2.5 hours per week. That means you should be spending about 9.5 hours per week, or nearly 2 hours per day, on this course outside of class.

Warning

You should be spending about 9.5 hours per week on this course outside of class.

Grading

This course is ungraded. You will still receive a letter grade for this course, but that grade will be the result of a series of conversations that you and I have over the course of the semester. Three times during the semester, you will submit a written reflection of your learning in the course to date to Moodle, and I will offer a written response. Then we’ll talk one-on-one. At the end of the semester, you will propose a grade to me. Most likely, I will accept your proposal and that is the grade that you will receive. However, I reserve the right to be the final arbiter of grades. Because it will be based on a series of conversations that we have, your grade will not be a surprise to you and you will (almost certainly) be satisfied with it.

Assignments

The learning that takes place in this class will be centered around three main components:

  • Lectures, Quizzes, and Labs: Lectures will challenge and clarify your understanding of the reading material. Weekly reading quizzes will assess that understanding. Labs will offer daily practice programming assignments in R. These assignments will help you retain information from the readings and lectures, and translate your knowledge into new practical abilities.
  • Pull request: You will submit a pull request to an active R project on GitHub, and continue to update it until it is accepted. You will receive detailed, line-by-line feedback on your pull request. Through this iterative process, you will understand how to contribute to a live open source software project.
  • R Package: You will work in a small group to develop a new R package (or substantially extend an existing one). You will receive feedback on the functionality, clarity, robustness, and readability of your code and documentation. This project will deepen your understanding of R as a programming language and provide you with software development experience using industry standard practices.

Extensions

I value your ability to meet deadlines and manage your own workload. I am also a reasonable person who understands that life happens and this is not always possible. Extensions up to 48 hours will typically be granted when requested at least 48 hours in advance without requiring a reason or an explanation. Longer extensions, or those requested within 48 hours of a deadline, will typically not be granted. Please plan accordingly. Please note that because many of the assignments in this class are collaborative, individual extensions for group assignments will be problematic. All extended deadlines will appear on Moodle.

Workflow

This course will expose you to the practice of data science through the use of modern workflows. These practices will increase the readability and reproducibility of your work. You should complete all of your homework assignments in Quarto. Unless otherwise noted, you should expect to submit your assigment by uploading a Quarto file (with a .qmd file extension) to Moodle.

Resources

Moodle and course website

The course website and Moodle will be updated regularly with lecture handouts, project information, assignments, and other course resources. Homework and grades will be submitted to Moodle. Please check both regularly.

Computing

The use of the R statistical computing environment with the RStudio interface is thoroughly integrated into the course. Both R and RStudio are free and open-source, and are installed on most computer labs on campus. Please see the Resources page for help with R. If you have a Chromebook, you should be able to complete the assignments using the Posit Workbench. Please see me if you don’t already have an account.

Unless otherwise noted, you should assume that it will be helpful to bring a laptop to class. If you do not have a laptop, there are loaner laptops available – please contact me if you need one.

Communication

  • Slack is the primary forum for course-related discussions of all kinds. Please do not email me with course-related questions! Instead, post those #questions on Slack. If discretion is absolutely necessary, private message me on Slack.
  • GitHub will host all of the code for projects associated with this course. All repositories are private by default.

Writing

Your ability to communicate results—which may be technical in nature—to your audience—which is likely to be non-technical—is critical to your success as a data analyst. The assignments in this class will place an emphasis on the clarity of your writing.

The Spinelli Center

The Spinelli Center (now in Seelye 207) supports students doing quantitative work across the curriculum. In particular, they employ:

Your fellow students are also an excellent source for explanations, tips, etc.

Tentative Schedule

The following outline gives a basic description of the course. Please see the detailed schedule for more specific information about readings and assignments.

Tentative schedule
Week Topic Reading Assignment
1 Names, Values, and Vectors Adv R, Ch. 1–3 reading quiz
2 GitHub | Subsetting & Control Flow R pkgs, Ch. 13 | Adv R, Ch. 4–5 reading quiz
3 Functions Adv R, Ch. 6 reading quiz
4 Regular Expressions | Package structure R pkgs, Ch. 2–4 reading quiz
5 Environments Adv R, Ch. 7 Div I project
6 Conditions Adv R, Ch. 8 reading quiz
7 Functionals (purrr) Adv R, Ch. 9 reading quiz
8 Testing, R CMD check R pkgs, Ch. 14 reading quiz
9 S3 classes Adv R, Ch. 9 reading quiz
10 Debugging Adv R, Ch. 22 Div II project
11 Metaprogramming Adv R, Ch. 17–20 reading quiz
12 Project work reading quiz
13 Project work Div III project

References

Reinhart, Alex, and Christopher R Genovese. 2020. “Expanding the Scope of Statistical Computing: Training Statisticians to Be Software Engineers.” Journal of Statistics Education, 1–23. https://doi.org/10.1080/10691898.2020.1845109.