Week 4: Clearing the Path from Cause to Effect

DSAN 5650: Causal Inference for Computational Social Science
Summer 2026, Georgetown University

Class Sessions
Author
Affiliation

Jeff Jacobs

Published

Wednesday, June 10, 2026

Open slides in new window →

Schedule

Today’s Planned Schedule:

Start End Topic
Lecture 6:30pm 7:10pm PGM as Modeling Language →
7:10pm 7:30pm The Ladder of Causal Inference →
7:30pm 7:50pm Elemental Confounds I: Forks and Chains →
Break! 7:50pm 8:00pm
8:00pm 8:50pm Elemental Confounds II: ⚠️Colliders⚠️ →
8:50pm 9:00pm Elemental Confounds III: Proxies →

Roadmap

5300 → Now

  • In e.g. 5300, you learned a bunch of ad hoc models: Linear Regression, Decision Trees, SVMs
  • PGMs provide a formalized modeling language for “writing out” models unambiguously in a way your computer understands: specifying exactly how to estimate parameters from data

Linear Regression as a PGM (Source)

Now → August: Class splits into two themes, running in parallel!

  • What kinds of cool comp social sci models are unlocked, that we can now implement in this language? [HW2]
  • How can we expand PGM vocabulary to incorporate causality? [Midterm]

From PGMs to SWIGs (Bezuidenhout et al. 2024)

Why Take the Time to Learn a Modeling Language (vs. Individual Models)?

  • My answer: allows you to adapt to specifics/idiosyncrasies of your problem!
  • Language metaphor: Learning models vs. learning modeling language \(\Leftrightarrow\) Learning phrases in a language vs. learning to speak the language
  • “Hello, one hamburger please” is good, but what if you…
  • Are allergic to ketchup and need to make sure it’s removed
  • Want to replace sesame seed bun with poppy seed bun, if they have it
  • Prefer spicy, but not too spicy, mustard Bun only Animal style

Languages give us a syntax

S \(\rightarrow\) NP VP
NP \(\rightarrow\) DetP N | AdjP NP
VP \(\rightarrow\) V NP
AdjP \(\rightarrow\) Adj | Adv AdjP
N \(\rightarrow\) frog | tadpole
V \(\rightarrow\) sees | likes
Adj \(\rightarrow\) big | small
Adv \(\rightarrow\) very
DetP \(\rightarrow\) a | the

…For expressing arbitrary (infinitely many!) sentences

Example 1: Multilevel Tadpoles (McElreath, Ch. 13)

Need a language that can communicate the following info to estimation algorithm:

  • Unit of observation is tadpole, but unit of analysis is tank
  • Ultimately, I care about \(Y =\) survival rate (dependent var), as function of \(X =\) tank properties (independent var)
  • …But the \(n_i = 48\) tanks actually come in \(n_j = 3\) types: small (10 bois), medium (25), large (35) (Bonus: What if there are different numbers of tanks per type?)
  • I need it to account for impact of tank size, then pool info across tank sizes

From McElreath (2020)

Example 2: Dissertation Nightmare

Above: Data from Soviet archives; Above Right: US Military archives; Below Right: NATO archives

Nightmarish Without a Modeling Language!

  • Modeling language \(\Rightarrow\) Unambiguously “encode” idiosyncratic domain knowledge
  • Dissertation: Cold War \(\times\) “Third World” \(\leadsto\) Cuban 🇨🇺 trans-continental operations1
  • Main narrative (for estimation): 1975 (South Africa invades Angola, 14 Oct → 🇨🇺 intervention, 4 Nov) to 1979 (USSR requests 🇨🇺 troops to Ethiopia for Ogaden War)
  • [Ontology] Fix 1979 geographic entities at National level (as modeling choice, like fixing 2000 USD to measure inflation): \(\textsf{Cuba}_{1979}\), \(\textsf{Angola}_{1979}\), \(\textsf{PDRY}_{1979}\), \(\textsf{YAR}_{1979}\)
  • Different tokens (Think NLP: "Congo", "DRC", "Republic of Congo") can then be contextualized: can “track” and link data appropriately despite splits, merges, name changes
  • Say we have data on “Number of Communist Militants in \(X\)” (Hoover Yearbook)…
Entity Data from 1947-1971 at... Data from 1971-Present at...
\(\textsf{Pakistan}_{1979}\) National Level: \(\frac{62}{62+70} \times\) “Pakistan” National Level: “Pakistan”
Subnational Level: “West Pakistan” Subnational Level: \(\sum_{i \in \text{Regions}}\text{data}_i\)
\(\textsf{Bangladesh}_{1979}\) National Level: \(\frac{70}{62+70} \times\) “Pakistan” National Level: “Bangladesh”
Subnational Level: “East Pakistan” Subnational Level: \(\sum_{i \in \text{Regions}}\text{data}_i\)

The Ladder of Causal Inference

Counterfactuals: What would have happened, if history was slightly different…
\(\Pr(Y_{M=M_0} \mid \textsf{do}(X)) - \Pr(Y_{M=M_0} \mid \textsf{do}(\neg X))\)
Intervention: What happens if I…
\(\Pr(Y \mid \textsf{do}(X)) - \Pr(Y \mid \textsf{do}(\neg X))\)
Association: What happened?
\(\Pr(Y \mid X) - \Pr(Y \mid \neg X)\)
  • \(\leadsto\) Stuff we add to probability theory in 5650 is to combat confounding: to “fix” whatever is making \(\Pr(Y \mid X) \neq \Pr(Y \mid \textsf{do}(X))\)!

Recap: The Four Elemental Confounds

From Richard McElreath’s Statistical Rethinking Lectures
Code
library(tidyverse) # For ggplot
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ lubridate 1.9.5     ✔ tibble    3.3.1
✔ purrr     1.2.1     ✔ tidyr     1.3.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(extraDistr) # For rbern()

Attaching package: 'extraDistr'

The following object is masked from 'package:purrr':

    rdunif
Code
library(patchwork) # For side-by-side plotting
library(ggtext) # For colors in titles
library(rethinking)
Loading required package: cmdstanr
This is cmdstanr version 0.9.0
- CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
- CmdStan path: /Users/jpj/.cmdstan/cmdstan-2.36.0
- CmdStan version: 2.36.0

A newer version of CmdStan is available. See ?install_cmdstan() to install it.
To disable this check set option or environment variable cmdstanr_no_ver_check=TRUE.
Loading required package: posterior
This is posterior version 1.7.0

Attaching package: 'posterior'

The following objects are masked from 'package:stats':

    mad, sd, var

The following objects are masked from 'package:base':

    %in%, match

Loading required package: parallel
rethinking (Version 2.42)

Attaching package: 'rethinking'

The following objects are masked from 'package:extraDistr':

    dbern, dlaplace, dpareto, rbern, rlaplace, rpareto

The following object is masked from 'package:purrr':

    map

The following object is masked from 'package:stats':

    rstudent
Code
library(dagitty)
n_d <- 10000 # For discrete RVs
n_c <- 300 # For continuous RVs

Pipes \(X \rightarrow Z \rightarrow Y\): Conditioning = Blocking

Code
library(rethinking)
library(dagitty)
library(ggdag)
pipe_dag <-dagitty("dag{
X[exposure]
Y[outcome]
X -> Y
X -> Z
Z -> Y
}")
coordinates(pipe_dag) <- list(
    x=c(X=0, Z=0.5, Y=1),
    y=c(X=1, Z=0.5, Y=1)
)
drawdag(pipe_dag, cex=4, lwd=5, radius=10)
drawopenpaths(pipe_dag, lwd=5)
adj_sets <- adjustmentSets(
    pipe_dag, effect="direct"
)
writeLines("Adjustment sets (direct effect):")
adj_sets
set.seed(5650)
cpipe_df <- tibble(
    X = rnorm(n_c),
    Z = rbern(n_c, plogis(X)),
    Y = rnorm(n_c, 2 * Z - 1)
)
cpipe_lm <- lm(Y ~ X, data=cpipe_df)
cpipe_slope <- round(cpipe_lm$coef['X'], 3)
cpipe_z0_lm <- lm(Y ~ X, data=cpipe_df |> filter(Z == 0))
cpipe_z0_slope <- round(cpipe_z0_lm$coef['X'], 2)
cpipe_z0_label <- paste0("<span style='color: #e69f00;'>Slope<sub>Z=0</sub> = ",cpipe_z0_slope,"</span>")
cpipe_z1_lm <- lm(Y ~ X, data=cpipe_df |> filter(Z == 1))
cpipe_z1_slope <- round(cpipe_z1_lm$coef['X'], 2)
cpipe_z1_label <- paste0("<span style='color: #56b4e9;'>Slope<sub>Z=1</sub> = ",cpipe_z1_slope,"</span>")
cpipe_z_texlabel <- paste0(cpipe_z0_label, " | ", cpipe_z1_label)
cpipe_xmin <- min(cpipe_df$X)
cpipe_xmax <- max(cpipe_df$X)
ggplot() +
  # Points
  geom_point(
    data=cpipe_df |> filter(Y > -3),
    aes(x=X, y=Y, color=factor(Z)),
    size=0.4*g_pointsize,
    alpha=0.8
  ) +
  # Overall lm
  geom_smooth(
    data=cpipe_df, aes(x=X, y=Y),
    method = lm, se = FALSE,
    linewidth = 3, color='white'
  ) +
  geom_smooth(
    data=cpipe_df, aes(x=X, y=Y),
    method = lm, se = FALSE,
    linewidth = 2.5, color='black'
  ) +
  theme_dsan(base_size=18) +
  theme(
    plot.title = element_text(size=18),
    plot.subtitle = element_markdown(size=16)
  ) +
  coord_equal() +
  labs(
    title = paste0(
      "Unstratified Slope = ",cpipe_slope
    ),
    x = "X", y = "Y", color = "Z"
  )


Attaching package: 'ggdag'
The following object is masked from 'package:stats':

    filter

Adjustment sets (direct effect):
{ Z }
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

Code
library(rethinking)
library(dagitty)
library(ggdag)
pipe_dag_closed <-dagitty("dag{
X[exposure]
Y[outcome]
Z[adjustedNode]
X -> Y
X -> Z
Z -> Y
}")
coordinates(pipe_dag_closed) <- list(
    x=c(X=0, Z=0.5, Y=1),
    y=c(X=1, Z=0.5, Y=1)
)
drawdag(pipe_dag_closed, cex=4, lwd=5, radius=10)
drawopenpaths(pipe_dag_closed, Z="Z", lwd=5)
adj_sets_closed <- adjustmentSets(
    pipe_dag_closed
)
writeLines("Adjustment sets (direct effect):")
adj_sets_closed
set.seed(5650)
cpipe_df <- tibble(
    X = rnorm(n_c),
    Z = rbern(n_c, plogis(X)),
    Y = rnorm(n_c, 2 * Z - 1)
)
cpipe_lm <- lm(Y ~ X, data=cpipe_df)
cpipe_slope <- round(cpipe_lm$coef['X'], 3)
cpipe_z0_lm <- lm(Y ~ X, data=cpipe_df |> filter(Z == 0))
cpipe_z0_slope <- round(cpipe_z0_lm$coef['X'], 2)
cpipe_z0_label <- paste0("<span style='color: #e69f00;'>Slope<sub>Z=0</sub> = ",cpipe_z0_slope,"</span>")
cpipe_z1_lm <- lm(Y ~ X, data=cpipe_df |> filter(Z == 1))
cpipe_z1_slope <- round(cpipe_z1_lm$coef['X'], 2)
cpipe_z1_label <- paste0("<span style='color: #56b4e9;'>Slope<sub>Z=1</sub> = ",cpipe_z1_slope,"</span>")
cpipe_z_texlabel <- paste0(cpipe_z0_label, " | ", cpipe_z1_label)
cpipe_xmin <- min(cpipe_df$X)
cpipe_xmax <- max(cpipe_df$X)
ggplot() +
  # Points
  geom_point(
    data=cpipe_df |> filter(Y > -3),
    aes(x=X, y=Y, color=factor(Z)),
    size=0.6*g_pointsize,
    alpha=0.8
  ) +
  # Stratified lm
  # (slightly larger black lines)
  geom_smooth(
    data=cpipe_df,
    aes(x=X, y=Y, group=factor(Z)),
    method=lm, se=FALSE, fullrange=TRUE,
    linewidth=2.75, color='black'
  ) +
  # (Colored lines)
  geom_smooth(
    data=cpipe_df,
    aes(x=X, y=Y, color=factor(Z)),
    method=lm, se=FALSE, fullrange=TRUE,
    linewidth=2
  ) +
  theme_dsan(base_size=18) +
  theme(
    plot.title = element_markdown(size=18),
    plot.subtitle = element_markdown(size=16)
  ) +
  coord_equal() +
  labs(
    title=cpipe_z_texlabel,
    x = "X", y = "Y", color = "Z"
  )

Adjustment sets (direct effect):
{}
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

Forks \(X \leftarrow Z \rightarrow Y\): Conditioning = Blocking

Code
library(rethinking)
library(dagitty)
library(ggdag)
pipe_dag <-dagitty("dag{
X[exposure]
Y[outcome]
X -> Y
Z -> X
Z -> Y
}")
coordinates(pipe_dag) <- list(
    x=c(X=0, Z=0.5, Y=1),
    y=c(X=1, Z=0.5, Y=1)
)
drawdag(pipe_dag, cex=4, lwd=5, radius=10)
drawopenpaths(pipe_dag, lwd=5)
adj_sets <- adjustmentSets(
    pipe_dag, effect="direct"
)
writeLines("Adjustment sets (direct effect):")
adj_sets
library(ggtext)
set.seed(5650)
cfork_df <- tibble(
    Z = rbern(n_c),
    X = rnorm(n_c, 2 * Z - 1),
    Y = rnorm(n_c, 2 * Z - 1)
)
library(latex2exp)
overall_lm <- lm(Y ~ X, data=cfork_df)
overall_slope <- round(overall_lm$coef['X'], 3)
z0_lm <- lm(Y ~ X, data=cfork_df |> filter(Z == 0))
z0_slope <- round(z0_lm$coef['X'], 2)
z0_label <- paste0("<span style='color: #e69f00;'>Slope<sub>Z=0</sub> = ",z0_slope,"</span>")
z1_lm <- lm(Y ~ X, data=cfork_df |> filter(Z == 1))
z1_slope <- round(z1_lm$coef['X'], 2)
z1_label <- paste0("<span style='color: #56b4e9;'>Slope<sub>Z=1</sub> = ",z1_slope,"</span>")
z_texlabel <- paste0(z0_label, " | ", z1_label)
cfork_xmin <- min(cfork_df$X)
cfork_xmax <- max(cfork_df$X)
ggplot() +
  # Points
  geom_point(
    data=cfork_df,
    aes(x=X, y=Y, color=factor(Z)),
    size=0.6*g_pointsize,
    alpha=0.8
  ) +
  # Overall lm
  geom_smooth(
    data=cfork_df, aes(x=X, y=Y),
    method = lm, se = FALSE,
    linewidth = 2.5, color='black'
  ) +
  theme_dsan(base_size=18) +
  theme(
    plot.title = element_text(size=18),
    plot.subtitle = element_markdown(size=16)
  ) +
  coord_equal() +
  labs(
    title = paste0(
      "Unstratified Slope = ",overall_slope
    ),
    x = "X", y = "Y", color = "Z"
  )

Adjustment sets (direct effect):
{ Z }
`geom_smooth()` using formula = 'y ~ x'

Code
library(rethinking)
library(dagitty)
library(ggdag)
fork_dag_closed <-dagitty("dag{
X[exposure]
Y[outcome]
Z[adjustedNode]
X -> Y
Z -> X
Z -> Y
}")
coordinates(fork_dag_closed) <- list(
    x=c(X=0, Z=0.5, Y=1),
    y=c(X=1, Z=0.5, Y=1)
)
fork_dag_closed <- setVariableStatus(fork_dag_closed, "adjustedNode", "Z")
drawdag(fork_dag_closed, cex=4, lwd=5, radius=10)
drawopenpaths(fork_dag_closed, Z="Z", lwd=5)
library(ggtext)
set.seed(5650)
cfork_df <- tibble(
    Z = rbern(n_c),
    X = rnorm(n_c, 2 * Z - 1),
    Y = rnorm(n_c, 2 * Z - 1)
)
library(latex2exp)
overall_lm <- lm(Y ~ X, data=cfork_df)
overall_slope <- round(overall_lm$coef['X'], 3)
z0_lm <- lm(Y ~ X, data=cfork_df |> filter(Z == 0))
z0_slope <- round(z0_lm$coef['X'], 2)
z0_label <- paste0("<span style='color: #e69f00;'>Slope<sub>Z=0</sub> = ",z0_slope,"</span>")
z1_lm <- lm(Y ~ X, data=cfork_df |> filter(Z == 1))
z1_slope <- round(z1_lm$coef['X'], 2)
z1_label <- paste0("<span style='color: #56b4e9;'>Slope<sub>Z=1</sub> = ",z1_slope,"</span>")
z_texlabel <- paste0(z0_label, " | ", z1_label)
cfork_xmin <- min(cfork_df$X)
cfork_xmax <- max(cfork_df$X)
ggplot() +
  # Points
  geom_point(
    data=cfork_df,
    aes(x=X, y=Y, color=factor(Z)),
    size=0.6*g_pointsize,
    alpha=0.8
  ) +
  # Overall lm
  geom_smooth(
    data=cfork_df, aes(x=X, y=Y),
    method = lm, se = FALSE,
    linewidth = 2.5, color='black'
  ) +
  # Stratified lm
  # (slightly larger black lines)
  geom_smooth(
    data=cfork_df,
    aes(x=X, y=Y, group=factor(Z)),
    method=lm, se=FALSE, fullrange=TRUE,
    linewidth=2.75, color='black'
  ) +
  # (Colored lines)
  geom_smooth(
    data=cfork_df,
    aes(x=X, y=Y, color=factor(Z)),
    method=lm, se=FALSE, fullrange=TRUE,
    linewidth=2
  ) +
  theme_dsan(base_size=18) +
  theme(
    plot.title = element_text(size=18),
    plot.subtitle = element_markdown(size=16)
  ) +
  coord_equal() +
  labs(
    title = paste0(
      "Unstratified Slope = ",overall_slope
    ),
    subtitle=z_texlabel,
    x = "X", y = "Y", color = "Z"
  )

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

Conditioning on a Proxy for \(Z\)

  • With just \(X \rightarrow Z \rightarrow Y\), we’d have a pipe
  • Observing \(A \Rightarrow\) some (not all!) info about \(Z\)

Code
library(tidyverse)
library(extraDistr)
library(latex2exp)
set.seed(5650)
cprox_df <- tibble(
    X = rnorm(n_c),
    Z = rbern(n_c, plogis(X)),
    Y = rnorm(n_c, 2 * Z - 1),
    A = rbern(n_c, (1-Z)*0.86 + Z*0.14)
)
cprox_lm <- lm(Y ~ X, data=cprox_df)
cprox_slope <- round(cprox_lm$coef['X'], 3)
cprox_a0_lm <- lm(Y ~ X, data=cprox_df |> filter(A == 0))
cprox_a0_slope <- round(cprox_a0_lm$coef['X'], 2)
cprox_a0_label <- paste0("<span style='color: #e69f00;'>Slope<sub>Z=0</sub> = ",cprox_a0_slope,"</span>")
# A == 1 lm
cprox_a1_lm <- lm(Y ~ X, data=cprox_df |> filter(A == 1))
cprox_a1_slope <- round(cprox_a1_lm$coef['X'], 2)
cprox_a1_label <- paste0("<span style='color: #56b4e9;'>Slope<sub>Z=1</sub> = ",cprox_a1_slope,"</span>")
cprox_a_texlabel <- paste0(cprox_a0_label, " | ", cprox_a1_label)
cprox_xmin <- min(cprox_df$X)
cprox_xmax <- max(cprox_df$X)
ggplot() +
  # Points
  geom_point(
    data=cprox_df |> filter(Y > -3),
    aes(x=X, y=Y, color=factor(A)),
    size=0.5*g_pointsize,
    alpha=0.8
  ) +
  # Overall lm
  geom_smooth(
    data=cprox_df, aes(x=X, y=Y),
    method = lm, se = FALSE,
    linewidth = 2.5, color='black'
  ) +
  # Stratified lm
  # (slightly larger black lines)
  geom_smooth(
    data=cprox_df,
    aes(x=X, y=Y, group=factor(A)),
    method=lm, se=FALSE, fullrange=TRUE,
    linewidth=2.75, color='black'
  ) +
  # (Colored lines)
  geom_smooth(
    data=cprox_df,
    aes(x=X, y=Y, color=factor(A)),
    method=lm, se=FALSE, fullrange=TRUE,
    linewidth=2
  ) +
  theme_dsan(base_size=22) +
  theme(
    plot.title = element_text(size=22),
    plot.subtitle = element_markdown(size=20)
  ) +
  coord_equal() +
  labs(
    title = paste0(
      "Unstratified Slope = ",cprox_slope
    ),
    subtitle=cprox_a_texlabel,
    x = "X", y = "Y", color = "A"
  )
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

⚠️Colliders⚠️ \(X \rightarrow Z \leftarrow Y\): Conditioning = Opening

Code
library(rethinking)
library(dagitty)
library(ggdag)
coll_dag <-dagitty("dag{
X[exposure]
Y[outcome]
X -> Y
X -> Z
Y -> Z
}")
coordinates(coll_dag) <- list(
    x=c(X=0, Z=0.5, Y=1),
    y=c(X=1, Z=0.5, Y=1)
)
drawdag(coll_dag, cex=4, lwd=5, radius=10)
drawopenpaths(coll_dag, lwd=5)
adj_sets_coll <- adjustmentSets(
    coll_dag, effect="direct"
)
writeLines("Adjustment sets (direct effect):")
adj_sets_coll
set.seed(5650)
ccoll_df <- tibble(
    X = rnorm(n_c),
    Y = rnorm(n_c),
    Z = rbern(n_c, plogis(2 * (X + Y - 1)))
)
ccoll_lm <- lm(Y ~ X, data=ccoll_df)
ccoll_slope <- round(ccoll_lm$coef['X'], 3)
ccoll_z0_lm <- lm(Y ~ X, data=ccoll_df |> filter(Z == 0))
ccoll_z0_slope <- round(ccoll_z0_lm$coef['X'], 2)
ccoll_z0_label <- paste0("<span style='color: #e69f00;'>Slope<sub>Z=0</sub> = ",ccoll_z0_slope,"</span>")
ccoll_z1_lm <- lm(Y ~ X, data=ccoll_df |> filter(Z == 1))
ccoll_z1_slope <- round(ccoll_z1_lm$coef['X'], 2)
ccoll_z1_label <- paste0("<span style='color: #56b4e9;'>Slope<sub>Z=1</sub> = ",ccoll_z1_slope,"</span>")
ccoll_z_texlabel <- paste0(ccoll_z0_label, " | ", ccoll_z1_label)
ccoll_xmin <- min(ccoll_df$X)
ccoll_xmax <- max(ccoll_df$X)
ggplot() +
  # Points
  geom_point(
    data=ccoll_df |> filter(Y > -3),
    aes(x=X, y=Y, color=factor(Z)),
    size=0.4*g_pointsize,
    alpha=0.8
  ) +
  # Overall lm
  geom_smooth(
    data=ccoll_df, aes(x=X, y=Y),
    method = lm, se = FALSE,
    linewidth = 3, color='white'
  ) +
  geom_smooth(
    data=ccoll_df, aes(x=X, y=Y),
    method = lm, se = FALSE,
    linewidth = 2.5, color='black'
  ) +
  theme_dsan(base_size=18) +
  theme(
    plot.title = element_markdown(size=18),
    plot.subtitle = element_markdown(size=16)
  ) +
  coord_equal() +
  labs(
    title=ccoll_z_texlabel,
    x = "X", y = "Y", color = "Z"
  )

Adjustment sets (direct effect):
{}
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

Code
library(rethinking)
library(dagitty)
library(ggdag)
fork_dag_closed <-dagitty("dag{
X[exposure]
Y[outcome]
Z[adjustedNode]
X -> Y
X -> Z
Y -> Z
}")
coordinates(fork_dag_closed) <- list(
    x=c(X=0, Z=0.5, Y=1),
    y=c(X=1, Z=0.5, Y=1)
)
fork_dag_closed <- setVariableStatus(fork_dag_closed, "adjustedNode", "Z")
drawdag(fork_dag_closed, cex=4, lwd=5, radius=10)
drawopenpaths(fork_dag_closed, Z="Z", lwd=5)
set.seed(5650)
ccoll_df <- tibble(
    X = rnorm(n_c),
    Y = rnorm(n_c),
    Z = rbern(n_c, plogis(2 * (X + Y - 1)))
)
ccoll_lm <- lm(Y ~ X, data=ccoll_df)
ccoll_slope <- round(ccoll_lm$coef['X'], 3)
ccoll_z0_lm <- lm(Y ~ X, data=ccoll_df |> filter(Z == 0))
ccoll_z0_slope <- round(ccoll_z0_lm$coef['X'], 2)
ccoll_z0_label <- paste0("<span style='color: #e69f00;'>Slope<sub>Z=0</sub> = ",ccoll_z0_slope,"</span>")
ccoll_z1_lm <- lm(Y ~ X, data=ccoll_df |> filter(Z == 1))
ccoll_z1_slope <- round(ccoll_z1_lm$coef['X'], 2)
ccoll_z1_label <- paste0("<span style='color: #56b4e9;'>Slope<sub>Z=1</sub> = ",ccoll_z1_slope,"</span>")
ccoll_z_texlabel <- paste0(ccoll_z0_label, " | ", ccoll_z1_label)
ccoll_xmin <- min(ccoll_df$X)
ccoll_xmax <- max(ccoll_df$X)
ggplot() +
  # Points
  geom_point(
    data=ccoll_df |> filter(Y > -3),
    aes(x=X, y=Y, color=factor(Z)),
    size=0.4*g_pointsize,
    alpha=0.8
  ) +
  # Stratified lm
  # (slightly larger black lines)
  geom_smooth(
    data=ccoll_df,
    aes(x=X, y=Y, group=factor(Z)),
    method=lm, se=FALSE, fullrange=TRUE,
    linewidth=2.75, color='black'
  ) +
  # (Colored lines)
  geom_smooth(
    data=ccoll_df,
    aes(x=X, y=Y, color=factor(Z)),
    method=lm, se=FALSE, fullrange=TRUE,
    linewidth=2
  ) +
  theme_dsan(base_size=18) +
  theme(
    plot.title = element_markdown(size=18),
    plot.subtitle = element_markdown(size=16)
  ) +
  coord_equal() +
  labs(
    title=ccoll_z_texlabel,
    x = "X", y = "Y", color = "Z"
  )

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

References

Bezuidenhout, Dana, Sarah Forthal, Kara Rudolph, and Matthew R Lamb. 2024. “Single World Intervention Graphs (SWIGs): A Practical Guide.” American Journal of Epidemiology, September 11, kwae353. https://doi.org/10.1093/aje/kwae353.
Gleijeses, Piero. 2013. Visions of Freedom: Havana, Washington, Pretoria, and the Struggle for Southern Africa, 1976-1991. UNC Press. https://books.google.com?id=eY74AAAAQBAJ.
McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and STAN. CRC Press. https://books.google.com?id=FuLWDwAAQBAJ.

Footnotes

  1. Helpful metaphor (Gleijeses 2013): Cuba \(\approx\) Forward-deployed “3rd World Outpost” for USSR (Soviet $ but Cuban training of PAIGC → MPLA), as Israel \(\approx\) Forward-deployed “3rd World Outpost” for US (US $ but Israeli training of SAVAK → SADF)↩︎