DSAN 5450: Data Ethics and Policy
Spring 2024, Georgetown University
Wednesday, February 14, 2024
\[ \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\nimplies}{\;\;\not\!\!\!\!\implies} \]
Run | Jump | Hurdle | Weights | |
---|---|---|---|---|
Aziza | 10.1” | 6.0’ | 40” | 150 lb |
Bogdan | 9.2” | 5.9’ | 42” | 140 lb |
Charles | 10.0” | 6.1’ | 39” | 145 lb |
It appears to reveal an unfortunate but inexorable fact about our world: we must choose between two intuitively appealing ways to understand fairness in ML. Many scholars have done just that, defending either ProPublica’s or Northpointe’s definitions against what they see as the misguided alternative. (Simons 2023)
The impossibility result is about much more than math. [It occurs because] the underlying outcome is distributed unevenly in society. This is a fact about society, not mathematics, and requires engaging with a complex, checkered history of systemic racism in the US. Predicting an outcome whose distribution is shaped by this history requires tradeoffs because the inequalities and injustices are encoded in data—in this case, because America has criminalized Blackness for as long as America has existed.
The Distributional Hypothesis (Firth 1968, 179)
You shall know a word by the company it keeps!
The Distributional [Fairness] Hypothesis
You shall know “fairness” by the company it keeps [i.e., the context it incorporates].
Fairness Through Awareness (Dwork et al. 2011)
Individuals who are similar with respect to a task should be classified similarly.
An algorithm is individually fair if, for all individuals \(x\) and \(y\), we have
\[ \textsf{dist}(r(x), r(y)) \leq \textsf{dist}(x, y) \]
\(\implies\) an advertising system must show similar sets of ads to similar users.
It achieves group fairness-through-parity for two groups of users \(S\) and \(T\) when:
\[ \textsf{dist}(\mathbb{E}_{s \in S}[r(s)], \mathbb{E}_{t \in T}[r(t)]) \leq \varepsilon \]
where \(\mathbb{E}_{s \in S}\) and \(\mathbb{E}_{t \in T}\) denote the expectation of ads seen by an individual chosen uniformly among \(S\) and \(T\). This definition implies that the difference in probability between two groups of seeing a particular ad will be bounded by \(\varepsilon\).
Given these definitions: Individual fairness \(\nimplies\) group fairness, and vice versa! (Riederer and Chaintreau 2017)
Since it’s impossible to eliminate information about sensitive attributes like race/gender/etc. from our ML algorithms, fairness should instead be defined on the basis of how this sensitive information “flows” through the causal chain of decisions which lead to an given (observed) outcome
The reason this approach is so promising is because, once we have a model of the causal connections among the variables that we care about (socially/normatively), and among the variables that are used by a Machine Learning algorithm, we can then use techniques developed by statisticians who study causal inference to block certain “causal pathways” that we deem normatively unjustifiable while allowing other pathways that we deem to be normatively justifiable.
Given this, the first subpart of this portion of the assignment will focus specifically on helping you develop intuition around the way of thinking required to make the jump from the correlational approach used in statistics and probability generally (and used in DSAN 5100 specifically!) to the causal approach which builds on the correlational approach but has a stricter standard for determining whether two or more Random Variables are related to one another. Then, in the second subpart, you will take this intuition and use it to evaluate fairness in a real-world setting!
(Based on Spurious Correlations, Tyler Vigen)
The only workable definition of “\(X\) causes \(Y\)”:
Defining Causality
\(X\) causes \(Y\) if and only if:
DSAN 5450 Week 5: Context-Sensitive Fairness