Integrating Data Science Ethics Into an Undergraduate Major

A Case Study

Benjamin S. Baumer and Katherine M. Kinnaird
Statistical & Data Sciences
Smith College

“The potential consequences of the ethical implications of data science cannot be overstated.”

— National Academies of Sciences, Engineering, and Medicine (2018)

Why do we
need to teach
data science ethics?

“I’m just an engineer”

Algorithms reflect the biases of their creators

A piece of data itself has no positive or negative moral value, but the way we manipulate it does. It’s hard to imagine a more contentious project than programing ethics into our algorithms; to do otherwise, however, and allow algorithms to monitor themselves, is to invite the quicksand of moral equivalence.

NAS report

Ethics is a topic that, given the nature of data science, students should learn and practice throughout their education. Academic institutions should ensure that ethics is woven into the data science curriculum from the beginning and throughout.

The data science community should adopt a code of ethics; such a code should be affirmed by members of professional societies, included in professional development programs and curricula, and conveyed through educational programs. The code should be reevaluated often in light of new developments.

Enhancement, not distraction

Data science ethics

statistical ethics

Something old, something new…

Old ethical notions:

  • How to Lie with Statistics

  • The Belmont Report (human subjects research)

  • reproducibility and replicability

  • p-hacking

  • conflicts of interest

Newer ethical notions:

  • algorithmic bias

  • web scraping (ToS)

  • doxxing

  • de-identifying personal data

  • re-identifying personal data

Our take

These ethical areas are obviously informed by longstanding ethical principles, but are distinct in the way that computers, the Internet, and databases have transformed the way we live.

What are the learning goals?

Latest SDS learning goal

Assess the ethical implications to society of data-based research, analyses, and technology in an informed manner.

Use resources, such as professional guidelines, institutional review boards, and published research, to inform ethical responsibilities.

What’s included?

Topic Categories Bloom
OkCupid data Application
StitchFix algorithms Application
Grey’s Anatomy practices, data Application
Copywriting music practices Evaluation
Coding race practices, data Synthesis
Weapons of Math Destruction practices, algorithms Evaluation

Copywriting Music

Debate: Should there be an academic license for music data?

Before Class

  • Students are assigned to a “side” and then prepare a position paper (3-5 pages, double spaced)

Beginning of Class

  • Two groups split off and decide a strategy

During Class

  • Debate is a timed back and forth set of arguments and rebuttals
  • Break before closing arguments

More on debate

  • Growing divide between access and development in industrial/corporate research labs and that in academic environments.

  • What data access do you have working at somewhere like Spotify, Apple Music, or Pandora? How does that access compare and contrast to the access at a research university or at a small liberal arts college?

  • Discussion about how laws designed to protect sales and artists (i.e copyright) differ from protecting privacy (ie. HIPAA or FERPA), and how ethical considerations vary when laws dictate “right and wrong” as opposed to morality

Late-breaking changes

SDS 100: Reproducible Scientific Computing with Data

The practice of data science rests upon computing environments that foster responsible uses of data and reproducible scientific inquiries. This course develops students’ ability to engage in data science work using modern workflows, open-source tools, and ethical practices. Students will learn how to author a scientific report written in a lightweight markup language (e.g., markdown) that includes code (e.g., R), data, graphics, text, and other media. Students will also learn to reason about ethical practices in data science.

What resources are available?

Books

More Books

Videos

Movies

Conferences

Teaching materials

Codes of ethics

Read the paper!

https://doi.org/10.1080/26939169.2022.2038041

THANK YOU!