Week 2: Machine Learning, Training Data, and Bias

DSAN 5450: Data Ethics and Policy
Spring 2024, Georgetown University

Class Sessions

Jeff Jacobs


Wednesday, January 24, 2024

Overview: Slouching Towards Fairness

  • First half: Remaining high-level issues!
  • Second half: you’ll start to understand why I kept maniacally pointing to \(p \implies q\) on the board last lecture!
  • “Rules” for fairness are not “rules” at all! They’re statements of the form “If we accept ethical framework \(x\), then our algorithms ought to satisfy condition \(y\)

\[ \underbrace{p(x)}_{\substack{\text{Accept ethical} \\ \text{framework }x}} \implies \underbrace{q(y)}_{\substack{\text{Algorithms should} \\ \text{satisfy condition }y}} \]

  • Last week: very broad intro to possible ethical frameworks (values for \(x\))
  • Today: very broad intro to possible fairness criteria (values for \(y\))
  • End of today: HW1: Nuts and Bolts for Evaluating Fairness

Ethical Issues in Data Science

  • Data Science for Who?
  • Operationalization
  • Fair Comparisons
  • Implementation

Data Science for Who(m)?

  • What are the processes by which data is measured, recorded, and distributed?

The Library of Missing Datasets. From D’Ignazio and Klein (2020)

Example: Measuring “Freedom” and “Human Rights”

  • Freedom House Ratings are the most common measure of “freedom” in a country, across social science literature; US State Dept. Country Reports on Human Rights Practices are the most common measure of “human rights” in a country, across social science literature
  • …So what’s the issue? (What is Jeff whining about this time?)

Example: Measuring “Freedom” and “Human Rights”


  • Think of common claims made on basis of “data”:
    • Markets create economic prosperity
    • A glass of wine in the evening prevents cancer
    • Policing makes communities safer
  • How exactly are “prosperity”, “preventing cancer”, “policing”, “community safety” being measured?

Thumbnail from full video (Quarto crashes when I embed it directly 😑)

Stiglitz, Sen, and Fitoussi (2010)

What Is Being Compared?

  • Are countries with 1 billion people comparable to countries with 10 million people?
  • Are countries which were colonized comparable to the colonizing countries?
  • When did the colonized countries gain independence?

Drèze and Sen (1991)


From D’Ignazio and Klein (2020), Ch. 6 (see also)

From Lerman and Weaver (2014)

Fairness… 🧐

Figure 1: From Lily Hu, Direct Effects: How Should We Measure Racial Discrimination?, Phenomenal World, 25 September 2020
Figure 2: From Kasy and Abebe (2021)

…And INVERSE Fairness 🤯

From Machine Learning What Policymakers Value (Björkegren, Blumenstock, and Knight 2022)

Ethical Issues in Applying Data Science

Facial Recognition Algorithms

Facia.ai (2023)

Wellcome Collection (1890)

Ouz (2023)

Wang and Kosinski (2018)

Large Language Models

Figure 3: From Schiebinger et al. (2020)
Figure 4: From DeepLearning.AI’s Deep Learning course

Military and Police Applications of AI

Ayyub (2019)

McNeil (2022)

Machine Learning at 30,000 Feet

Three Component Parts of Machine Learning

  1. A cool algorithm 😎😍
  2. [Possibly benign but possibly biased] Training data ❓🧐
  3. Exploitation of below-minimum-wage human labor 😞🤐 (Dube et al. 2020, like and subscribe yall, get those ❤️s goin)

A Cool Algorithm 😎😍

Training Data With Acknowledged Bias

  • One potentially fruitful approach to fairness: since we can’t eliminate it, bring it out into the open and study it!
    • This can, at very least, help us brainstorm how we might “correct” for it (next slides!)

From Gendered Innovations in Science, Health & Medicine, Engineering, and Environment

Word Embeddings

Bolukbasi et al. (2016)
  • Notice how the \(x\)-axis has been selected by the researcher specifically to draw out (one) gendered dimension of language!
    • \(\overrightarrow{\texttt{she}}\) mapped to \(\langle -1,0\rangle\), \(\overrightarrow{\texttt{he}}\) mapped to \(\langle 1,0 \rangle\), others projected onto this dimension

Removing vs. Studying Biases


