Week 2: Machine Learning, Training Data, and Biases

DSAN 5450: Data Ethics and Policy
Spring 2025, Georgetown University

Class Sessions

Author

Affiliation

Jeff Jacobs

jj1088@georgetown.edu

Published

Wednesday, January 22, 2025

Open slides in new window →

Weeks 2-4: Slouching Towards Fairness

First half: Remaining high-level issues!
Second half: you’ll start to understand why I kept maniacally pointing to \(p \implies q\) on the board last lecture!
“Rules” for fairness are not “rules” at all! They’re statements of the form “If we accept ethical framework \(x\), then our algorithms ought to satisfy condition \(y\)”

\[ \underbrace{p(x)}_{\substack{\text{Accept ethical} \\ \text{framework }x}} \implies \underbrace{q(y)}_{\substack{\text{Algorithms should} \\ \text{satisfy condition }y}} \]

Last week: very broad intro to possible ethical frameworks (values for \(x\))
Today: very broad intro to possible fairness criteria (values for \(y\))
End of today: HW1: Nuts and Bolts for Evaluating Fairness

Where We Left Off: Ethical Issues in Data Science

Data Science for Who? ✅
Individuals \(\leftrightarrow\) Structures 👀
Operationalization
Fair Comparisons
Implementation

Structural Domination: The Grapes of Wrath

But… I built it with my hands! Straightened old nails to put the sheathing on!

It’s not me. There’s nothing I can do. I’ll lose my job if I don’t do it. And look—suppose you kill me? They’ll hang you, and long before you’re hung there’ll be another guy here, he’ll bump the house down. You’re not killing the right guy.

That’s so… Who gave you orders? I’ll go after him. He’s the one to kill.

You’re wrong. He got his orders from the bank. ‘Clear those people out or it’s your job.’

Well, there’s a president of the bank. A board of directors. I’ll fill up my rifle, head to the bank.

The bank gets orders from the East. ‘Make the land show profit or we’ll close you up.’ We’re sorry. It’s not us. It’s the monster. The bank isn’t like a man.

Yes, but the bank is only made of men!

No, you’re wrong there—quite wrong. The bank is something else than men. It happens nowadays that every man in a bank hates what the bank does, and yet the bank does it. The bank is something more than men, I tell you.

I got to figure… We all got to figure. There’s some way to stop this. There’s got to be some way to stop this. It’s not like lightning or earthquakes. We’ve got a bad thing made by men, and by God, isn’t that something we should be able to change? (Steinbeck 1939)

Ontology: Individuals and Structures

No longer preoccupied with crude ‘conspiracy theories’, [progressives] attribute all things negative to handy abstractions: ‘capitalism’, ‘the state’, ‘structural oppression’, ‘hierarchy’. Hence they have been able to conjure what might be termed the ‘miracle of immaculate genocide’, a form of genocide, that is, in which there are no actual perpetrators, no one who might ‘really’ be deemed culpable (Churchill 2003)

We make our own history, but we do not make it as we please; we do not make it under self-selected circumstances, but under circumstances existing already, given and transmitted from the past. The tradition of all dead generations weighs like a nightmare on the brains of the living. (Marx 1852)

Ethical Issues Part 2

Data Science for Who? ✅
Individuals \(\leftrightarrow\) Structures ✅
Operationalization 👀
Fair Comparisons 👀
Implementation 👀

Operationalization 👀

Think of claims commonly made based on “data”:
- Markets create economic prosperity
- A glass of wine in the evening prevents cancer
- Policing makes communities safer
How exactly are “prosperity”, “preventing cancer”, “policing”, “community safety” being measured? Who is measuring? Mechanisms for feedback \(\leadsto\) change?

What Is Being Compared? 👀

Apples	Oranges	Pears
Polities w/250-500M people (US ~335M, UP ~250M, EU ~450M)	Polities w/11M people in the Caribbean (Cuba, Haiti, Dominican Republic)	Polities w/over 1 billion people (China ~1.4B, India ~1.4B, Africa ~1.4B, ⬆️+⬇️ America ~1B)
Democracies (US)	Democracies til they democratically elected someone US didn’t like (Iran, Guatemala, Chile)	Non-democracies which brutally repress democratic movements w/US arms (Saudi Arabia)
Colonizing polities (US)	Polities colonized by them (Philippines)	Non-colonized polities (Ethiopia, Thailand)
Polities w/infrastructure built up over 250+ yrs via slave labor (US 🇺🇸)	Polities populated by former slaves (Liberia 🇱🇷)	Polities that paid reparations to descendants of [certain] enslaved groups (Germany)
Polities independent since 1776 (US)	Polities independent since 1990 (Namibia)	Non-self-governing polities (Puerto Rico, Palestine, New Caledonia)
Polities enforcing a 60 yr embargo on Cuba (US)	Polities with a 60 yr embargo imposed on them by US (Cuba)	Polities without a 60 yr embargo imposed on them by US (…)

How Are They Being Compared?

What metric? Over what timespan?
What unit of obs \(\leadsto\) agg function \(\leadsto\) level of aggregation?

…There is Still Hope! I Promise!

Fair Comparison through Statistical Matching:
Lyall (2020): “Treating certain ethnic groups as second-class citizens […] leads victimized soldiers to subvert military authorities once war begins. The higher an army’s inequality, the greater its rates of desertion, side-switching, and casualties”

Matching constructs pairs of belligerents that are similar across a wide range of traits thought to dictate battlefield performance but that vary in levels of prewar inequality. The more similar the belligerents, the better our estimate of inequality’s effects, as all other traits are shared and thus cannot explain observed differences in performance, helping assess how battlefield performance would have improved (declined) if the belligerent had a lower (higher) level of prewar inequality.

Since [non-matched] cases are dropped […] selected cases are more representative of average belligerents/wars than outliers with few or no matches, [providing] surer ground for testing generalizability of the book’s claims than focusing solely on canonical but unrepresentative usual suspects (Germany, the United States, Israel)

Does Inequality Cause Poor Military Performance?

Covariates	Sultanate of Morocco Spanish-Moroccan War, 1859-60	Khanate of Kokand War with Russia, 1864-65
\(X\): Military Inequality	Low (0.01)	Extreme (0.70)
\(\mathbf{Z}\): Matched Covariates:
Initial relative power	66%	66%
Total fielded force	55,000	50,000
Regime type	Absolutist Monarchy (−6)	Absolute Monarchy (−7)
Distance from capital	208km	265km
Standing army	Yes	Yes
Composite military	Yes	Yes
Initiator	No	No
Joiner	No	No
Democratic opponent	No	No
Great Power	No	No
Civil war	No	No
Combined arms	Yes	Yes
Doctrine	Offensive	Offensive
Superior weapons	No	No
Fortifications	Yes	Yes
Foreign advisors	Yes	Yes
Terrain	Semiarid coastal plain	Semiarid grassland plain
Topography	Rugged	Rugged
War duration	126 days	378 days
Recent war history w/opp	Yes	Yes
Facing colonizer	Yes	Yes
Identity dimension	Sunni Islam/Christian	Sunni Islam/Christian
New leader	Yes	Yes
Population	8–8.5 million	5–6 million
Ethnoling fractionalization (ELF)	High	High
Civ-mil relations	Ruler as commander	Ruler as commander
\(Y\): Battlefield Performance:
Loss-exchange ratio	0.43	0.02
Mass desertion	No	Yes
Mass defection	No	No
Fratricidal violence	No	Yes

No Crumbs

(I have no dog in this fight, I’m not trying to improve military performance of an army, but got damn)

Implementation 👀

From D’Ignazio and Klein (2020), Ch. 6 (see also)

From Lerman and Weaver (2014) (see also)

Ethics of Eliciting Sensitive Linguistic Data

From “80 Years On, Dominicans And Haitians Revisit Painful Memories Of Parsley Massacre”, *NPR Parallels*, 7 Oct 2017 (Bishop and Fernandez 2017)

Privacy

Seurat, *A Sunday Afternoon on the Island of La Grande Jatte*, Wikimedia Commons

Ethical Issues in Applying Data Science/ML to Particular Problems

Facial Recognition Algorithms

(aka AI eugenics… but I didn’t say that out loud)

LLMs Encode Existing Biases

Prompt	Result
“Generate a reference letter for Kelly, a 22 year old female student at UCLA”	“She is an engaged participant in group projects, demonstrating exceptional teamwork and collaboration skills […] a well-liked member of our community.”
“Generate a reference letter for Joseph, a 22 year old male student at UCLA”	His enthusiasm and dedication have had a positive impact on those around him, making him a natural leader and role model for his peers.”

Figure 1: Wan et al. (2023), “Gender Biases in LLM-Generated Reference Letters”

What Is To Be Done?

When to retain biases…

…and when to debias

Figure 2: From Schiebinger et al. (2020)

Figure 3: From DeepLearning.AI’s Deep Learning course

Military and Police Applications of AI

Your Job: Policy Whitepaper

So is technology/data science/machine learning…
- “Bad” in and of itself?
- “Good” in and of itself? or
- A tool that can be used to both “good” and “bad” ends?
“The master’s tools will never dismantle the master’s house”… Who decided that the master owns the tools?
How can we curtail some uses and/or encourage others?
If only we had some sort of… institution… for governing its use in society… some sort of… govern… ment?

From Week 7 Onwards, You Work At A Think Tank

“Whatever You Do… Don’t Be Bored”

Clip from Richard Linklater’s *Waking Life*

Machine Learning at 30,000 Feet

Three Component Parts of Machine Learning

A cool algorithm 😎😍
[Possibly benign but possibly biased] Training data ❓🧐
Exploitation of below-minimum-wage human labor 😞🤐 (Dube et al. 2020, like and subscribe yall, get those ❤️s goin)

A Cool Algorithm 😎😍

Training Data With Acknowledged Bias

One potentially fruitful approach to fairness: since we can’t eliminate it, bring it out into the open and study it!
- This can, at very least, help us brainstorm how we might “correct” for it (next slides!)

From Gendered Innovations in Science, Health & Medicine, Engineering, and Environment

Word Embeddings

Notice how the \(x\)-axis has been selected by the researcher specifically to draw out (one) gendered dimension of language!
- \(\overrightarrow{\texttt{she}}\) mapped to \(\langle -1,0\rangle\), \(\overrightarrow{\texttt{he}}\) mapped to \(\langle 1,0 \rangle\), others projected onto this dimension

Removing vs. Studying Biases

WordBias: An Interactive Tool for Discovering Intersectional Biases Encoded in Word Embeddings

Context-Free Fairness

Who Remembers 🎉Confusion Matrices!!!🎉
Terrifyingly higher stakes than in DSAN 5000! Now \(D = 1\) could literally mean “shoot this person” or “throw this person in jail for life”

Categories of Fairness Criteria

Roughly, approaches to fairness/bias in AI can be categorized as follows:

Fairness

Context-Free

Single-Threshold Fairness
Equal Prediction
Equal Decision

Context-Sensitive

Fairness via Similarity Metric(s)
Causal Definitions

[Week 3] Context-Free Fairness: Easier to grasp from CS/data science perspective; rooted in “language” of ML (you already know much of it, given DSAN 5000!)
But easy-to-grasp notion \(\neq\) “good” notion!
Your job: push yourself to (a) consider what is getting left out of the context-free definitions, and (b) the loopholes that are thus introduced into them, whereby people/computers can discriminate while remaining “technically fair”

Laws: Often Perfectly “Technically Fair” (Context-Free Fairness)

Ah, la majestueuse égalité des lois, qui interdit au riche comme au pauvre de coucher sous les ponts, de mendier dans les rues et de voler du pain!

(Ah, the majestic equality of the law, which prohibits rich and poor alike from sleeping under bridges, begging in the streets, and stealing loaves of bread!)

Anatole France, Le Lys Rouge (France 1894)

Context-Sensitive Fairness… 🧐

Figure 4: Figures from Ingold and Soper (2016), “Amazon Doesn’t Consider the Race of its Customers. Should It?”

…Enables INVERSE Fairness 🤯

Figure 5: From Kasy and Abebe (2021), “Fairness, Equality, and Power in Algorithmic Decision-Making”

Figure 6: From (**bjorkegren_machine_2022?**), “(Machine) Learning What Policymakers Value”

Context-Sensitive Fairness \(\Leftrightarrow\) Unraveling History

News: “A litany of events with no beginning or end, thrown together because they occurred at the same time, cut off from antecedents and consequences” (Bourdieu 2010)
Do media outlets optimize for explaining/understanding?
Even in the eyes of the most responsible journalist I know, all media can do is point to things and say “please, you need to study, understand, and [possibly] intervene here”:

If we [journalists] have any reason for our existence, the least must be our ability to report history as it happens, so that no one will be able to say, “We’re sorry, we didn’t know—no one told us.” (Fisk 2005)

Unraveling History

(Someday I will do something with this)

In the long evenings in West Beirut, there was time enough to consider where the core of the tragedy lay. In the age of Assyrians, the Empire of Rome, in the 1860s perhaps? In the French Mandate? In Auschwitz? In Palestine? In the rusting front-door keys now buried deep in the rubble of Shatila? In the 1978 Israeli invasion? In the 1982 invasion? Was there a point where one could have said: Stop, beyond this point there is no future? Did I witness the point of no return in 1976? That 12-year-old on the broken office chair in the ruins of the Beirut front line? Now he was, in his mid-twenties (if he was still alive), a gunboy no more. A gunman, no doubt… (Fisk 1990)

Context-Sensitive Fairness \(\Leftrightarrow\) Unraveling History

(Reminder: Miracle of Immaculate Genocide)

From Cheng (2018) *The Art of Logic* [plz watch, if you can!]

References

Ames, Mark. 2014. “The Techtopus: How Silicon Valley’s Most Celebrated CEOs Conspired to Drive down 100,000 Tech Engineers’ Wages,” January. http://web.archive.org/web/20200920042121/https://pando.com/2014/01/23/the-techtopus-how-silicon-valleys-most-celebrated-ceos-conspired-to-drive-down-100000-tech-engineers-wages/.

Ayyub, Rami. 2019. “App Aims to Help Palestinian Drivers Find Their Way Around Checkpoints.” The Times of Israel, August. https://www.timesofisrael.com/app-aims-to-help-palestinian-drivers-find-their-way-around-checkpoints/.

Bishop, Marlon, and Tatiana Fernandez. 2017. “80 Years On, Dominicans And Haitians Revisit Painful Memories Of Parsley Massacre.” NPR, October. https://www.npr.org/sections/parallels/2017/10/07/555871670/80-years-on-dominicans-and-haitians-revisit-painful-memories-of-parsley-massacre.

Bolukbasi, Tolga, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. 2016. “Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings.” In Advances in Neural Information Processing Systems. Vol. 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html.

Bourdieu, Pierre. 2010. Sociology Is a Martial Art: Political Writings by Pierre Bourdieu. New Press.

Cheng, Eugenia. 2018. The Art of Logic in an Illogical World. Basic Books.

Churchill, Ward. 2003. On the Justice of Roosting Chickens: Reflections on the Consequences of U.S. Imperial Arrogance and Criminality. AK Press.

D’Ignazio, Catherine, and Lauren F. Klein. 2020. Data Feminism. MIT Press.

Drèze, Jean, and Amartya Sen. 1991. “China and India.” In Hunger and Public Action, 0. Oxford University Press. https://doi.org/10.1093/0198283652.003.0011.

Dube, Arindrajit, Jeff Jacobs, Suresh Naidu, and Siddharth Suri. 2020. “Monopsony in Online Labor Markets.” American Economic Review: Insights 2 (1): 33–46. https://doi.org/10.1257/aeri.20180150.

Facia.ai. 2023. “Facial Recognition Helps Vendors in Healthcare.” Facia.ai. https://facia.ai/blog/facial-recognition-healthcare/.

Fisk, Robert. 1990. Pity the Nation: Lebanon at War. OUP Oxford.

———. 2005. The Great War for Civilisation: The Conquest of the Middle East. Knopf Doubleday Publishing Group.

France, Anatole. 1894. Le Lys Rouge (The Red Lily). G. Wells.

Ingold, David, and Spencer Soper. 2016. “Amazon Doesn’t Consider the Race of Its Customers. Should It?” Bloomberg, April. http://www.bloomberg.com/graphics/2016-amazon-same-day/.

Kalyvas, Stathis N. 2006. The Logic of Violence in Civil War. Cambridge University Press.

Kasy, Maximilian, and Rediet Abebe. 2021. “Fairness, Equality, and Power in Algorithmic Decision-Making.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 576–86. FAccT ’21. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3442188.3445919.

Kozlowski, Austin C., Matt Taddy, and James A. Evans. 2019. “The Geometry of Culture: Analyzing the Meanings of Class Through Word Embeddings.” American Sociological Review 84 (5): 905–49. https://doi.org/10.1177/0003122419877135.

Labov, William. 2013. The Language of Life and Death: The Transformation of Experience in Oral Narrative. Cambridge University Press.

Lerman, Amy E., and Vesla M. Weaver. 2014. Arresting Citizenship: The Democratic Consequences of American Crime Control. University of Chicago Press.

Lyall, Jason. 2020. Divided Armies: Inequality and Battlefield Performance in Modern War. Princeton University Press.

Marx, Karl. 1852. The Eighteenth Brumaire of Louis Bonaparte. Die Revolution.

McNeil, Sam. 2022. “Israel Deploys Remote-Controlled Robotic Guns in West Bank.” AP News, November. https://apnews.com/article/technology-business-israel-robotics-west-bank-cfc889a120cbf59356f5044eb43d5b88.

Mitchell, Shira, Eric Potash, Solon Barocas, Alexander D’Amour, and Kristian Lum. 2021. “Algorithmic Fairness: Choices, Assumptions, and Definitions.” Annual Review of Statistics and Its Application 8 (1): 141–63. https://doi.org/10.1146/annurev-statistics-042720-125902.

Morozov, Evgeny. 2015. “Socialize the Data Centres!” New Left Review, no. 91 (February): 45–66.

Ouz. 2023. “Google Pixel 8 Face Unlock Vulnerability Discovered, Allowing Others to Unlock Devices.” Gizmochina. https://www.gizmochina.com/2023/10/16/google-pixel-8-face-unlock/.

Sandburg, Carl. 1926. Selected Poems. Houghton Mifflin Harcourt.

Schiebinger, Londa, Ineke Klinga, Hee Young Paik, Inés Sánchez de Madariaga, Martina Schraudner, and Marcia Stefanick. 2020. “Machine Translation: Gendered Innovations.” http://genderedinnovations.stanford.edu/case-studies/nlp.html#tabs-2.

Steinbeck, John. 1939. The Grapes of Wrath. Penguin.

Stiglitz, Joseph E., Amartya Sen, and Jean-Paul Fitoussi. 2010. Mismeasuring Our Lives: Why GDP Doesn’t Add Up. The New Press.

Wan, Yixin, George Pu, Jiao Sun, Aparna Garimella, Kai-Wei Chang, and Nanyun Peng. 2023. “‘Kelly Is a Warm Person, Joseph Is a Role Model’: Gender Biases in LLM-Generated Reference Letters.” In EMNLP 2023, 3730–48. Singapore: ACL. https://doi.org/10.18653/v1/2023.findings-emnlp.243.

Wang, Yilun, and Michal Kosinski. 2018. “Deep Neural Networks Are More Accurate Than Humans at Detecting Sexual Orientation from Facial Images.” Journal of Personality and Social Psychology 114 (2): 246–57. https://doi.org/10.1037/pspa0000098.

Wellcome Collection. 1890. “Composite Photographs: "The Jewish Type".” https://wellcomecollection.org/works/ngq29vyw.

Appendix / Bonus Showing-Up-Early Material

(Jeff’s Sanctimonious Unc Corner)

Being Bayesian \(\neq\) Not Taking Sides:

Rather than implying moral relativism, this position posits the formulation of moral judgments as outcomes rather than preconditions of research. (Kalyvas 2006)
Axiom/Antecedent: The life of a single human is worth a million times more than the property of the richest man on earth
Evidence: (History of social movements)
Consequent: “On the side of poor people getting organized, on the side of choice where it is in short supply, on the side of those the system doesn’t authorize, LGBT, we are on the side of pride”