Week 6: Causality in Ethics and Policy

DSAN 5450: Data Ethics and Policy
Spring 2025, Georgetown University

Class Sessions
Author
Affiliation

Jeff Jacobs

Published

Wednesday, February 19, 2025

Open slides in new window →

Recap / Eureka Moment (for Midterm Prep Purposes)

  • I totally forgot to mention: John Stuart Mill, the progenitor of what we today identify as utilitarianism, was himself tortured mercilessly, by his father John Mill (bffs with Jeremy Bentham) for the “greater good of society”!

Blasting Off Into Causality!

DGPs and the Emergence of Order

  • Who (besides Aaron) can guess the state of this process after 10 steps, with 1 person?
  • 10 people? 50? 100? (If they find themselves on the same spot, they stand on each other’s heads)
  • 100 steps? 1000?

The Result: 16 Steps

Code
library(tibble)
library(ggplot2)
library(ggExtra)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Code
library(tidyr)
# From McElreath!
gen_histo <- function(reps, num_steps) {
  support <- c(-1,1)
  pos <-replicate(reps, sum(sample(support,num_steps,replace=TRUE,prob=c(0.5,0.5))))
  #print(mean(pos))
  #print(var(pos))
  pos_df <- tibble(x=pos)
  clt_distr <- function(x) dnorm(x, 0, sqrt(num_steps))
  plot <- ggplot(pos_df, aes(x=x)) +
    geom_histogram(aes(y = after_stat(density)), fill=cbPalette[1], binwidth = 2) +
    stat_function(fun = clt_distr) +
    theme_dsan("quarter") +
    theme(title=element_text(size=16)) +
    labs(
      title=paste0(reps," Random Walks, ",num_steps," Steps")
    )
  return(plot)
}
gen_walkplot <- function(num_people, num_steps, opacity=0.15) {
  support <- c(-1, 1)
  # Unique id for each person
  pid <- seq(1, num_people)
  pid_tib <- tibble(pid)
  pos_df <- tibble()
  end_df <- tibble()
  all_steps <- t(replicate(num_people, sample(support, num_steps, replace = TRUE, prob = c(0.5, 0.5))))
  csums <- t(apply(all_steps, 1, cumsum))
  csums <- cbind(0, csums)
  # Last col is the ending positions
  ending_pos <- csums[, dim(csums)[2]]
  end_tib <- tibble(pid = seq(1, num_people), endpos = ending_pos, x = num_steps)
  # Now convert to tibble
  ctib <- as_tibble(csums, name_repair = "none")
  merged_tib <- bind_cols(pid_tib, ctib)
  long_tib <- merged_tib %>% pivot_longer(!pid)
  # Convert name -> step_num
  long_tib <- long_tib %>% mutate(step_num = strtoi(gsub("V", "", name)) - 1)
  # print(end_df)
  grid_color <- rgb(0, 0, 0, 0.1)

  # And plot!
  walkplot <- ggplot(
      long_tib,
      aes(
          x = step_num,
          y = value,
          group = pid,
          # color=factor(label)
      )
  ) +
      geom_line(linewidth = g_linesize, alpha = opacity, color = cbPalette[1]) +
      geom_point(data = end_tib, aes(x = x, y = endpos), alpha = 0) +
      scale_x_continuous(breaks = seq(0, num_steps, num_steps / 4)) +
      scale_y_continuous(breaks = seq(-20, 20, 10)) +
      theme_dsan("quarter") +
      theme(
          legend.position = "none",
          title = element_text(size = 16)
      ) +
      theme(
          panel.grid.major.y = element_line(color = grid_color, linewidth = 1, linetype = 1)
      ) +
      labs(
          title = paste0(num_people, " Random Walks, ", num_steps, " Steps"),
          x = "Number of Steps",
          y = "Position"
      )
}
wp1 <- gen_walkplot(500, 16, 0.05)
Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if
`.name_repair` is omitted as of tibble 2.0.0.
ℹ Using compatibility `.name_repair`.
Code
ggMarginal(wp1, margins = "y", type = "histogram", yparams = list(binwidth = 1))

The Result: 64 Steps

Code
library(ggExtra)
wp2 <- gen_walkplot(5000,64,0.008) +
  ylim(-30,30)
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Code
ggMarginal(wp2, margins = "y", type = "histogram", yparams = list(binwidth = 1))
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).

“Mathematical/Scientific Modeling”

  • Thing we observe (poking out of water): data
  • Hidden but possibly discoverable through deeper investigation (ecosystem under surface): model / DGP

So What’s the Problem?

  • Non-probabilistic models: High potential for being garbage (see: Bayesianism)
  • Probabilistic models: Getting there, still “surface” phenomena
    • Of the \(N = 100\) times we observed event \(X\) occurring, event \(Y\) also occurred \(90\) of those times
    • \(\implies \Pr(Y \mid X) = \frac{\#[X, Y]}{\#[X]} = \frac{90}{100} = 0.9\)
  • Causal models: Does \(Y\) happen because of \(X\) happening? For that, need to start modeling what’s happening under the surface that make \(X\) and \(Y\) “pop up” together so often

Causal Inference

The Intuitive Problem of Inferring Causality

library(dplyr)
library(ggplot2)
ga_lawyers <- c(21362, 22254, 23134, 23698, 24367, 24930, 25632, 26459, 27227, 27457)
ski_df <- tibble::tribble(
  ~year, ~varname, ~value,
  2000, "ski_revenue", 1551,
  2001, "ski_revenue", 1635,
  2002, "ski_revenue", 1801,
  2003, "ski_revenue", 1827,
  2004, "ski_revenue", 1956,
  2005, "ski_revenue", 1989,
  2006, "ski_revenue", 2178,
  2007, "ski_revenue", 2257,
  2008, "ski_revenue", 2476,
  2009, "ski_revenue", 2438,
)
ski_mean <- mean(ski_df$value)
ski_sd <- sd(ski_df$value)
ski_df <- ski_df %>% mutate(val_scaled = 12*value, val_norm = (value - ski_mean)/ski_sd)
law_df <- tibble::tibble(year=2000:2009, varname="ga_lawyers", value=ga_lawyers)
law_mean <- mean(law_df$value)
law_sd <- sd(law_df$value)
law_df <- law_df %>% mutate(val_norm = (value - law_mean)/law_sd)
spur_df <- dplyr::bind_rows(ski_df, law_df)
ggplot(spur_df, aes(x=year, y=val_norm, color=factor(varname, labels = c("Ski Revenue","Lawyers in Georgia")))) +
  stat_smooth(method="loess", se=FALSE) +
  geom_point(size=g_pointsize/1.5) +
  labs(
    fill="",
    title="Ski Revenue vs. Georgia Lawyers",
    x="Year",
    color="Correlation: 99.2%",
    linetype=NULL
  ) +
  theme_dsan() +
  scale_x_continuous(
    breaks=seq(from=2000, to=2014, by=2)
  ) +
  #scale_y_continuous(
  #  name="Total Revenue, Ski Facilities (Million USD)",
  #  sec.axis = sec_axis(~ . * law_sd + law_mean, name = "Number of Lawyers in Georgia")
  #) +
  scale_y_continuous(breaks = -1:1,
    labels = ~ . * round(ski_sd,1) + round(ski_mean,1),
    name="Total Revenue, Ski Facilities (Million USD)",
    sec.axis = sec_axis(~ . * law_sd + law_mean, name = "Number of Lawyers in Georgia")) +
  expand_limits(x=2010) +
  #geom_hline(aes(yintercept=x, color="Mean Values"), as.data.frame(list(x=0)), linewidth=0.75, alpha=1.0, show.legend = TRUE) +
  scale_color_manual(
    breaks=c('Ski Revenue', 'Lawyers in Georgia'),
    values=c('Ski Revenue'=cbPalette[1], 'Lawyers in Georgia'=cbPalette[2])
  )
`geom_smooth()` using formula = 'y ~ x'

cor(ski_df$value, law_df$value)
[1] 0.9921178

(Based on Spurious Correlations, Tyler Vigen)

  • This, however, is only a mini-boss. Beyond it lies the truly invincible FINAL BOSS… 🙀

The Fundamental Problem of Causal Inference

The only workable definition of “\(X\) causes \(Y\)”:

Defining Causality

\(X\) causes \(Y\) if and only if:

  1. \(X\) temporally precedes \(Y\) and
    • In two worlds \(W_0\) and \(W_1\) where
    • everything is exactly the same except that \(X = 0\) in \(W_0\) and \(X = 1\) in \(W_1\),
    • \(Y = 0\) in \(W_0\) and \(Y = 1\) in \(W_1\).

(Hume 1739, ruining everything as usual 😤)

  • The problem? We live in one world, not two identical worlds simultaneously 😭

What Is To Be Done?

Face Everything And Rise: Controlled, Randomized Experiment Paradigm

  • Find good comparison cases: Treatment and Control
  • Without a control group, you cannot make inferences!
  • Selecting on the dependent variable…

Selecting on the Dependent Variable

  • Jeff’s rant: If you care about actually solving social issues, this should infuriate you

Complications: Selection

  • Tldr: Why did this person (unit) end up in the treatment group? Why did this other person (unit) end up in the control group?
  • Are there systematic differences?
  • “““Vietnam”“” “““War”“” Draft: Why can’t we just study [men who join the military] versus [men who don’t], and take the difference as a causal estimate?

The Solution: Matching

  • Controlled experiment: we can ensure (since we have control over the assignment mechanism) the only systematic difference between \(C\) and \(T\) is: \(T\) received treatment, \(C\) did not
  • In an observational study, we’re “too late”! Thus, we no longer refer to assignment but to selection
  • Our job is to figure out (reverse engineer!) the selection mechanism, then correct for its non-randomness. Spoiler: “transform” observational \(\rightarrow\) experimental via weighting.
  • That’s the gold at end of rainbow. The rainbow itself is…

Do-Calculus

Our Data-Generating Process

  • \(Y\): Future success, \(\mathcal{R}_Y = \{0, 1\}\)
  • \(E\): Private school education, \(\mathcal{R}_E = \{0, 1\}\)
  • \(V\): Born into poverty, \(\mathcal{R}_V = \{0, 1\}\)
The Private School \(\leadsto\) Success Pipeline 🤑
  1. Sample independent RVs \(U_1 \sim \mathcal{B}(1/2)\), \(U_2 \sim \mathcal{B}(1/3)\), \(U_3 \sim \mathcal{B}(1/3)\)
  2. \(V \leftarrow U_1\)
  3. \(E \leftarrow \textsf{if }(V = 1)\textsf{ then } 0\textsf{ else }U_2\)
  4. \(Y \leftarrow \textsf{if }(V = 1)\textsf{ then }0\textsf{ else }U_3\)

Chalkboard Time…

  • \(\Pr(Y = 1) = \; ?\)
  • \(\Pr(Y = 1 \mid E = 1) = \; ?\)

Top Secret Answers Slide (Don’t Peek)

  • \(\Pr(Y = 1) = \frac{1}{6}\)
  • \(\Pr(Y = 1 \mid E = 1) = \frac{1}{3}\)
  • \(\overset{✅}{\implies}\) One out of every three private-school graduates is successful, vs. one out of every six graduates overall
  • \(\overset{❓}{\implies}\) Private school education doubles likelihood of success!
  • The latter is only true if intervening/changing/doing \(E = 0 \leadsto E = 1\) is what moves \(\Pr(Y = 1)\) from \(\frac{1}{6}\) to \(\frac{1}{3}\)!

Chalkboard Time 2: Electric Boogaloo

  • \(\Pr(Y = 1) = \frac{1}{6}\)
  • \(\Pr(Y = 1 \mid E = 1) = \frac{1}{3}\)
  • \(\Pr(Y = 1 \mid \textsf{do}(E = 1)) = \; ?\)
  • Here, \(\textsf{do}(E = 1)\) means diving into the DGP below the surface and changing it so that \(E = 1\)Setting \(E\) to be \(1\)
\(\text{DGP}(Y \mid \textsf{do}(E = 1))\)
  1. Sample independent RVs \(U_1 \sim \mathcal{B}(1/2)\), \(U_2 \sim \mathcal{B}(1/3)\), \(U_3 \sim \mathcal{B}(1/3)\)
  2. \(V \leftarrow U_1\)
  3. \(E \leftarrow \textsf{if }(V = 1)\textsf{ then } 0\textsf{ else }U_2\)
  4. \(Y \leftarrow \textsf{if }(V = 1)\textsf{ then }0\textsf{ else }U_3\)

Double Quadruple Secret Answer Slide

  • \(\Pr(Y = 1) = \frac{1}{6}\)
  • \(\Pr(Y = 1 \mid E = 1) = \frac{1}{3}\)
  • \(\Pr(Y = 1 \mid \textsf{do}(E = 1)) = \frac{1}{6}\)

References

Hume, David. 1739. A Treatise of Human Nature: Being an Attempt to Introduce the Experimental Method of Reasoning Into Moral Subjects; and Dialogues Concerning Natural Religion. Longmans, Green.