DSAN 5450: Data Ethics and Policy
Spring 2025, Georgetown University
Wednesday, February 12, 2025
Today’s Planned Schedule:
Start | End | Topic | |
---|---|---|---|
Lecture | 6:30pm | 7:00pm | Setting the Table: HW1 \(\leadsto\) HW2 → |
7:00pm | 7:15pm | Issues with Context-Free Fairness → | |
7:15pm | 7:30pm | Bringing in Context → | |
7:30pm | 8:00pm | Similarity-Based Fairness → | |
Break! | 8:00pm | 8:10pm | |
8:10pm | 9:00pm | Causal Fairness Building Blocks → |
\[ \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\bigexp}[1]{\exp\mkern-4mu\left[ #1 \right]} \newcommand{\bigexpect}[1]{\mathbb{E}\mkern-4mu \left[ #1 \right]} \newcommand{\definedas}{\overset{\small\text{def}}{=}} \newcommand{\definedalign}{\overset{\phantom{\text{defn}}}{=}} \newcommand{\eqeventual}{\overset{\text{eventually}}{=}} \newcommand{\Err}{\text{Err}} \newcommand{\expect}[1]{\mathbb{E}[#1]} \newcommand{\expectsq}[1]{\mathbb{E}^2[#1]} \newcommand{\fw}[1]{\texttt{#1}} \newcommand{\given}{\mid} \newcommand{\green}[1]{\color{green}{#1}} \newcommand{\heads}{\outcome{heads}} \newcommand{\iid}{\overset{\text{\small{iid}}}{\sim}} \newcommand{\lik}{\mathcal{L}} \newcommand{\loglik}{\ell} \DeclareMathOperator*{\maximize}{maximize} \DeclareMathOperator*{\minimize}{minimize} \newcommand{\mle}{\textsf{ML}} \newcommand{\nimplies}{\;\not\!\!\!\!\implies} \newcommand{\orange}[1]{\color{orange}{#1}} \newcommand{\outcome}[1]{\textsf{#1}} \newcommand{\param}[1]{{\color{purple} #1}} \newcommand{\pgsamplespace}{\{\green{1},\green{2},\green{3},\purp{4},\purp{5},\purp{6}\}} \newcommand{\prob}[1]{P\left( #1 \right)} \newcommand{\purp}[1]{\color{purple}{#1}} \newcommand{\sign}{\text{Sign}} \newcommand{\spacecap}{\; \cap \;} \newcommand{\spacewedge}{\; \wedge \;} \newcommand{\tails}{\outcome{tails}} \newcommand{\Var}[1]{\text{Var}[#1]} \newcommand{\bigVar}[1]{\text{Var}\mkern-4mu \left[ #1 \right]} \]
Just one (❗️) “primary” axiom required for first-order predicate logic!
But, to make it more digestible, we can define some “helper axioms”:
\(p\) | \(\neg p\) |
---|---|
0 | 1 |
1 | 0 |
\(p\) | \(q\) | \(p \wedge q\) |
---|---|---|
0 | 0 | 0 |
0 | 1 | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
[Axiom ] A binary operator “\(\barwedge\)”, defined (for human consumption) s.t. \(a \barwedge b \triangleq \neg(a \wedge b)\), but defined precisely as the binary operator which maps the blue cols into orange col:
\(p\) | \(q\) | \(p \underset{\small\text{or}}{\vee} q\) | \(p \underset{\small\text{xor}}{\otimes} q\) | \(p \underset{\small\text{and}}{\wedge} q\) | \(p \underset{\small\text{nand}}{\barwedge} q\) |
---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 1 |
0 | 1 | 1 | 1 | 0 | 1 |
1 | 0 | 1 | 1 | 0 | 1 |
1 | 1 | 1 | 0 | 1 | 0 |
Given Axiom , we can derive Axiom as a theorem (meaning, it does not need to be included as an axiom!):
\[ [\text{Theorem 1}] \; ~ p \barwedge p \text{ satisfies all properties of }\neg p \]
Therefore, define \(\neg p \triangleq p \barwedge p\)
Given Axiom and Theorem 1, we can derive Axiom as a theorem:
\[ [\text{Theorem 2}] ~ \neg(p \barwedge q) \text{ satisfies all properties of }p \wedge q \]
Therefore, define \(p \wedge q \triangleq \neg(p \barwedge q)\)
These are, at root, logical connectors: given “atomic” (non-implicational) logical predicates \(p\) and \(q\), we can form implicational predicates (here “\(\equiv\)” means “will always have the same logical value (T or F) as”):
English | Logical Form | \(\equiv\) | Contrapositive Form | If True Then… |
---|---|---|---|---|
“If \(p\) then \(q\)” | \(p \Rightarrow q\) or \(q \Leftarrow p\) | \(\equiv\) | \(\neg q \Rightarrow \neg p\) | \(p\) sufficient for \(q\) |
“If \(q\) then \(p\)” | \(q \Rightarrow p\) or \(p \Leftarrow q\) | \(\equiv\) | \(\neg p \Rightarrow \neg q\) | \(p\) necessary for \(q\) |
The truth table for \(\Rightarrow\) looks like:
\(p\) | \(q\) | \(p \Rightarrow q\) | \(\neg p \vee q\) |
---|---|---|---|
0 | 0 | 1 | 1 |
0 | 1 | 1 | 1 |
1 | 0 | 0 | 0 |
1 | 1 | 1 | 1 |
Which means, finally, we can “plug in” English statements for \(p\) and \(q\):
And evaluate based on \(p\) and \(q\) (board time!)
When defendants are booked into jail in Broward County, Florida, they are asked to respond to a COMPAS questionnaire with 137 questions, including “Was one of your parents ever sent to jail or prison?,” “How many of your friends/acquaintances are taking drugs illegally?,” and “How often did you get into fights at school?” Arrestees are also asked to agree or disagree with the statements “A hungry person has the right to steal” and “If people make me angry or I lose my temper, I can be dangerous.” Answers are fed into the COMPAS model, which generates an individual risk score that is reported in three buckets: “low risk” (1 to 4), “medium risk” (5 to 7), and “high risk” (8 to 10).
ProPublica accused COMPAS of racism: “There’s software used across the country to predict future criminals. And it’s biased against blacks,” read the subheading on the article. ProPublica found that COMPAS’s error rates—the rate at which the model got it wrong—were unequal across racial groups. COMPAS’s predictions were more likely to incorrectly label African Americans as high risk and more likely to incorrectly label white Americans as low risk. “In the criminal justice context,” said Julia Angwin, coauthor of the ProPublica article, “false findings can have far-reaching effects on the lives of the people charged with crimes.
One option: argue about which of the two definitions is “better” for the next 100 years (what is the best way to give food to the poor?)
It appears to reveal an unfortunate but inexorable fact about our world: we must choose between two intuitively appealing ways to understand fairness in ML. Many scholars have done just that, defending either ProPublica’s or Northpointe’s definitions against what they see as the misguided alternative. (Simons 2023)
Another option: study and then work to ameliorate the social conditions which force us into this realm of mathematical impossibility (why do the poor have no food?)
The impossibility result is about much more than math. [It occurs because] the underlying outcome is distributed unevenly in society. This is a fact about society, not mathematics, and requires engaging with a complex, checkered history of systemic racism in the US. Predicting an outcome whose distribution is shaped by this history requires tradeoffs because the inequalities and injustices are encoded in data—in this case, because America has criminalized Blackness for as long as America has existed.
The Distributional Hypothesis (Firth 1968, 179)
You shall know a word by the company it keeps!
The Distributional [Fairness] Hypothesis
You shall know “fairness” by the company it keeps [i.e., the context it incorporates].
Fairness Through Awareness (Dwork et al. 2011)
Individuals who are similar with respect to a task should be classified similarly.
An algorithm is individually fair if, for all individuals \(x\) and \(y\), we have
\[ \textsf{dist}(r(x), r(y)) \leq \textsf{dist}(x, y) \]
\(\implies\) an advertising system must show similar sets of ads to similar users.
It achieves group fairness-through-parity for two groups of users \(S\) and \(T\) when:
\[ \textsf{dist}(\mathbb{E}_{s \in S}[r(s)], \mathbb{E}_{t \in T}[r(t)]) \leq \varepsilon \]
where \(\mathbb{E}_{s \in S}\) and \(\mathbb{E}_{t \in T}\) denote the expectation of ads seen by an individual chosen uniformly among \(S\) and \(T\). This definition implies that the difference in probability between two groups of seeing a particular ad will be bounded by \(\varepsilon\).
Given these definitions: Individual fairness \(\nimplies\) group fairness, and vice versa! (Riederer and Chaintreau 2017)
(Data from Spurious Correlations, Tyler Vigen)
The only workable definition of “\(X\) causes \(Y\)”:
Defining Causality
\(X\) causes \(Y\) if and only if:
What “““research”“” “““says”“” about identifying people who might commit mass shootings
DSAN 5450 Week 5: Context-Sensitive Fairness