Resume screening

Learning goals

After completing this assignment, students should be able to:

Identify privilege hazard and give examples of how it works in practice
Describe how machine learning algorithms for resume screening work, and why they are likely biased

Content

Read (mandatory):
Read (optional but recommended):
- Deborah Raji
- Safiya Noble
- Weapons of Math Destruction, Introduction
- Weapons of Math Destruction, Ch. 6 - Ineligible to Serve

In-class activity

Possible discussion prompts:

Hard-coding discrimination

How do models like decision trees hard-code discrimination?:

Social scientist Kate Crawford has advanced the idea that the biggest threat from artificial intelligence systems is not that they will become smarter than humans, but rather that they will hard-code sexism, racism, and other forms of discrimination into the digital infrastructure of our societies. (DF, p. 29)

An outsized proportion of these people were nonwhite. The human being had also rejected female applicants. The machine naturally, did the same. (WMD, p. 117)

An algorithm reflects the biases of its creators:

What we choose to measure is a statement of what we value in health,” he explains. We might edit his statement to add that it’s a measure of who we value in health, too. (DF, p. 23)

Reliability vs. accuracy

The result of these programs, much as with college admissions, is that those with the money and resources to prepare their resumes come out on top. (WMD, p. 114)

Awareness and engagement

Ethics education and identity taxation?:

If things were different—if the 79 percent of engineers at Google who are male were specifically trained in structural oppression before building their data systems (as social workers are before they undertake social work)—then their overrepresentation might be very slightly less of a problem. But in the meantime, the onus falls on the individuals who already feel the adverse effects of those systems of power to prove, over and over again, that racism and sexism exist—in datasets, in data systems, and in data science, as in everywhere else. (DF, p. 31)

Data scientist as public intellectual:

Taken together, Buolamwini’s various initiatives demonstrate how any “solution” to bias in algorithms and datasets must tackle more than technical limitations. In addition, they present a compelling model for the data scientist as public intellectual—who, yes, works on technical audits and fixes, but also works on cultural, legal, and political efforts too. (DF, p. 32)

Data science and capitalism:

How did we get to the point where data science is used almost exclusively in the service of profit (for a few), surveillance (of the minoritized), and efficiency (amidst scarcity)? It’s worth stepping back to make an observation about the organization of the data economy: data are expensive and resource-intensive, so only already powerful institutions—corporations, governments, and elite research universities—have the means to work with them at scale. These resource requirements result in data science that serves the primary goals of the institutions themselves. We can think of these goals as the three Ss: science (universities), surveillance (governments), and selling (corporations). This is not a normative judgment (e.g., “all science is bad”) but rather an observation about the organization of resources. If science, surveillance, and selling are the main goals that data are serving, because that’s who has the money, then what other goals and purposes are going underserved? (DF, p. 41)

Compare the ideas in this quotation from DF with those in Lenin’s “Who Stands to Gain?” essay:

This often means asking uncomfortable questions: who is doing the work of data science (and who is not)? Whose goals are prioritized in data science (and whose are not)? And who benefits from data science (and who is either overlooked or actively harmed)? These questions are uncomfortable because they unmask the inconvenient truth that there are groups of people who are disproportionately benefitting from data science, and there are groups of people who are disproportionately harmed. Asking these who questions allows us, as data scientists ourselves, to start to see how privilege is baked into our data practices and our data products. (DF, p. 26)

Accountability

Pseudoscience

Phrenology was a model that relied on pseudoscience nonsense to make authoritative pronouncements, and for decades it went untested. (WMD, p. 122)

Impact of tech on climate change:

A 2017 Greenpeace report estimated that the global IT sector, which is largely US-based, accounted for around 7 percent of the world’s energy use. This is more than some of largest countries in the world, including Russia, Brazil, and Japan. Unless that energy comes from renewable sources (which the Greenpeace report shows that it does not), the cloud has a significant accelerating impact on global climate change. (DF, p. 41)

Lack of democratic accountability in corporate America (e.g., Twitter, Facebook):

And those goals are neither neutral nor democratic—in the sense of having undergone any kind of participatory, public process. On the contrary, focusing on those three Ss—science, surveillance, and selling—to the exclusion of other possible objectives results in significant oversights with life-altering consequences. (DF, p. ??)

Assignment

Respond to the following prompts on a single piece of paper.

In one paragraph, explain the notion of privilege hazard (DF, pg. 29) and give an example of how it is manifest in data science.
In one paragraph, discuss how machine learning algorithms for resume screening (such as those at Kronos or Amazon) value attendance at a women’s college, as described in D’Ignazio and Klein (2020) and O’Neil (2016).
In one paragraph, reflect on how you feel about your attendance at a women’s college in light of your response to the previous question.

Submission

Hand in before leaving the classroom.

Rubric

Resume screening rubric
Criteria	Improvable	Good
Overall Quality	Incomplete or missing submission. Student did not make an earnest effort to complete the assignment. Formatting errors make the submission unreadable.	Student accurately summarizes the content of the documentary. Student’s reflection is thoughtful and genuine.

References

D’Ignazio, Catherine, and Lauren F Klein. 2020. Data Feminism. MIT Press. https://mitpress.mit.edu/books/data-feminism.

O’Neil, Cathy. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown. https://www.penguinrandomhouse.com/books/241363/weapons-of-math-destruction-by-cathy-oneil/.