Week 8: Propensity Score Weighting

DSAN 5650: Causal Inference for Computational Social Science
Summer 2025, Georgetown University

Class Sessions

Author

Affiliation

Jeff Jacobs

jj1088@georgetown.edu

Published

Wednesday, July 9, 2025

Open slides in new window →

Schedule

Today’s Planned Schedule:

	Start	End	Topic
Lecture	6:30pm	7:00pm	Final Project Pep Talk →
	7:00pm	7:50pm	Controlling For Things vs. Matching/Weighting →
Break!	7:50pm	8:00pm
	8:00pm	9:00pm	Propensity Score Lab →

Code

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

source("../dsan-globals/stat_binscatter.r")
source("../dsan-globals/_globals.r")

Roadmap for Part 2

\(\downarrow\) Focus on abstract concepts / terminology
\(\uparrow\) Focus on data analysis
(Should hopefully start to feel more like other DSAN classes!)
(\(\Rightarrow\) Need you to take specific examples I pick and analogize them to your field(s) of interest)

Final Projects

Notion! Then, choose a path (rough draft of paths):
- Modeling a social phenomenon with PGMs
- Taking an existing project/interest (e.g., from 5300) and pushing towards the causality asymptote
Choose a path by Tuesday, July 15, 6:30pm EDT

Propensity Score Weighting

HW1 matching: similarity either 1 (applied to same schools) or 0 (didn’t apply to same schools)
Propensity scores: model quality of match
\(\Rightarrow\) Opens up (literally) infinite possibilities between 0 and 1!
\(\Rightarrow\) Use your unsupervised learning skills (matching via clustering)
\(\Rightarrow\) Use your supervised learning skills (propensity scores = logistic regression coefficients!)

Figure 1: Should this be matched with an **apple**? Or an **orange**? Porque no los dos! [Image source]

\(\Rightarrow\) Use your diagnostic skills: e.g., methods for evaluating clusters / preventing overfitting

Working Example: Growth Mindset(!)

From Athey and Wager (2019)
Treatment \(T\), called intervention in the dataset: a seminar on growth mindset for high school students
Outcome \(Y\), called achievement_score in the dataset: performance on state’s standardized test
In a perfect world, we could just compute

\[ \mathbb{E}[Y \mid T = 1] - \mathbb{E}[Y \mid T = 0] \]

Code

library(tidyverse)
library(broom)
student_df <- read_csv("assets/learning_mindset.csv")

Rows: 10391 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (13): schoolid, intervention, achievement_score, success_expect, ethnici...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

(mean_df <- student_df |> group_by(intervention) |> summarize(mean_score=mean(achievement_score)))

intervention	mean_score
0	-0.1538030
1	0.3184686

Code

student_naive_lm <- lm(achievement_score ~ intervention, data=student_df)
student_naive_lm |> broom::tidy() |> select(term, estimate)

term	estimate
(Intercept)	-0.1538030
intervention	0.4722717

Code

student_df |>
  ggplot(aes(x=intervention, y=achievement_score)) +
  geom_boxplot(
    aes(group=intervention),
    width=0.5
  ) +
  geom_smooth(
    method='lm',
    formula='y ~ x',
    se=TRUE
  ) +
  geom_point(
    data=mean_df,
    aes(x=intervention, y=mean_score),
    size=3
  ) +
  theme_dsan(base_size=16)

The Problem: Pesky Covariates

Here the “blob” \(\mathbf{X}\) forms a fork, as drawn…
But in reality the work of modeling is flying into the cloud and modeling the \(\mathbf{X}\)-\(T\) and \(\mathbf{X}\)-\(Y\) relationships (especially: figuring out which covariates \(X_j \in \mathbf{X}\) are colliders), so you can close the backdoors:

This can be really difficult, for a bunch of reasons… What if there was an easier way?

If We Had Control Over Everything (Experiments vs. Observational Data Analysis)

If we could intervene in the DGP, we could assign treatment randomly, thus removing the impact of Covariates on \(T\)!

Alas, we are data scientists, not (necessarily) experiment-conductors, plus there are often ethical reasons to not perform experiments!
…There’s still another approach!

Closing Backdoors the Too-Good-To-Be-True Way

Key insight from causal thinking: Transformation of the problem from “control for all covariates” to “close all backdoor paths”…
For the goal of just closing these paths, we have an alternative¹:

Rosenbaum and Rubin (1983): there exists a statistic \(\mathtt{e}(\mathbf{X}) = \Pr(T \mid \mathbf{X})\), the propensity score, which “captures” info in \(\textbf{X}\) relevant to \(T\) such that
Conditioning on \(\mathtt{e}(\mathbf{X})\) closes \(\mathbf{X} \Rightarrow \mathtt{e}(\mathbf{X}) \rightarrow T\) (\(\mathtt{e}(\mathbf{X})\) is a pipe)

This would close backdoor path \(T \leftarrow \mathtt{e}(\mathbf{X}) \Leftarrow \mathbf{X} \rightarrow Y\), leaving only direct effect \(T \rightarrow Y\)! There’s one remaining complication…

Closing Backdoors via Propensity Score Estimation

Sadly we don’t observe true probability of being treated for all possible values of \(\mathbf{X}\)
But, we can derive an estimate \(\hat{\mathtt{e}}(\mathbf{X})\) using our machine learning skills 😎
We now have that \(\hat{\mathtt{e}}(\mathbf{X})\), as a proxy relative to the pipe \(\mathbf{X} \Rightarrow \mathtt{e}(\mathbf{X}) \rightarrow T\), blocks the pipe to the extent that it captures the true probability \(\mathtt{e}(\mathbf{X}) = \Pr(T \mid \mathbf{X})\)

Backdoor Path: \(T \leftarrow \mathtt{e}(\mathbf{X}) \Leftarrow \mathbf{X} \rightarrow Y\)
Closed in proportion to \(\left[ \text{Cor}(\hat{\mathtt{e}}(\mathbf{X}), \mathtt{e}(\mathbf{X})) \right]^2 = ❓\)

Sometimes-Helpful Thought Experiment

Back in our basic confounding scenario:
If there was only one covariate (\(\mathbf{X} = X\)), and it was a constant (\(\Pr(X = c) = 1\)), then all the variation in \(Y\) would be due to variation in \(T\)
Less extreme: if person \(i\) has covariates \(\mathbf{X}_i\) and person \(j\) has covariates \(\mathbf{X}_j\), but \(\mathbf{X}_i = \mathbf{X}_j\), then variation in their outcomes is due solely to \(T\)

Part of the logic of propensity score is that, if person \(i\) has covariates \(\mathbf{X}_i\) and person \(j\) has covariates \(\mathbf{X}_j\), but \(\mathtt{e}(\mathbf{X}_i) = \mathtt{e}(\mathbf{X}_j)\), then \(i\) and \(j\) are perfectly matched
\(\Rightarrow\) (by fun math proof) variation in their outcomes is due solely to \(T\)

How Exactly Do We Adjust For \(\hat{\mathtt{e}}(\mathbf{X})\)?

Simulation example: smoking reduction

Code

library(tidyverse)
library(Rlab)
set.seed(5650)
n <- 250
motiv_vals <- runif(n, 0, 1)
enroll_vals <- ifelse(
  motiv_vals < 0.25,
  0,
  # We know motiv > 0.25
  ifelse(
    motiv_vals > 0.75,
    1,
    # We know 0.25 < motiv < 0.75
    rbern(n, prob=(motiv_vals - 0.125)*1.5)
  )
)

Warning in rbinom(n, size = 1, prob = prob): NAs produced

Code

ncigs_vals <- rbinom(n, size=30, prob=0.6-0.2*enroll_vals)
smoke_df <- tibble(
  motiv=motiv_vals,
  enroll=enroll_vals,
  ncigs=ncigs_vals
)
(smoke_mean_df <- smoke_df |> group_by(enroll) |> summarize(mean_ncigs=mean(ncigs)))

enroll	mean_ncigs
0	17.88060
1	12.26724

Code

naive_smoke_lm <- lm(ncigs ~ enroll, data=smoke_df)
summary(naive_smoke_lm) |> broom::tidy() |>
  select(term, estimate)

term	estimate
(Intercept)	17.880597
enroll	-5.613356

Code

smoke_df |> ggplot(aes(x=enroll, y=ncigs)) +
  geom_boxplot(
    aes(group=enroll),
    width=0.5
  ) +
  geom_smooth(
    method='lm',
    formula='y ~ x',
    se=TRUE
  ) +
  geom_point(
    data=smoke_mean_df,
    aes(x=enroll, y=mean_ncigs),
    size=3
  ) +
  theme_dsan(base_size=24)

Inverse Probability-of-Treatment Weighting

Code

eprop_model <- glm(enroll ~ motiv, family='binomial', data=smoke_df)
eprop_preds <- predict(eprop_model, type="response")
smoke_df <- smoke_df |> mutate(pred=eprop_preds)
# Use the preds to compute IPW
smoke_df <- smoke_df |> rowwise() |> mutate(
  ipw=ifelse(enroll, 1/pred, 1/(1-pred))
) |> arrange(pred)
#smoke_df
smoke_df |> mutate(enroll=factor(enroll)) |>
  ggplot(aes(x=motiv, y=ncigs, color=enroll)) +
  geom_point() +
  theme_dsan(base_size=24) +
  labs(title="Before Weighting")

Code

smoke_df |> mutate(enroll=factor(enroll)) |>
  ggplot(aes(
    x=motiv, y=ncigs, color=enroll, size=ipw,
    alpha=log(ipw-1)
  )) +
  geom_point() +
  guides(alpha="none") +
  theme_dsan(base_size=24) +
  labs(title="After Weighting")

Code

smoke_df |>
  ggplot(aes(x=motiv)) +
  # Predictions
  geom_point(
    aes(y=enroll, color=factor(enroll))
  ) +
  # Values
  geom_point(
    aes(y=pred, color=factor(enroll))
  ) +
  labs(color="enroll") +
  theme_dsan(base_size=24) +
  labs(title="Propensity to Enroll")
ipw_min <- min(smoke_df$ipw)
ipw_max <- max(smoke_df$ipw)
smoke_df <- smoke_df |> mutate(
  ipw_scaled = (ipw - ipw_min) / (ipw_max - ipw_min)
)
smoke_df |>
  ggplot(aes(x=motiv)) +
  # Predictions
  geom_point(
    aes(y=enroll, color=factor(enroll))
  ) +
  # Values
  geom_point(
    aes(y=ipw_scaled, color=factor(enroll))
  ) +
  theme_dsan(base_size=24) +
  labs(
    title="Inverse Probability-of-Treatment Weights (IPTW)",
    color="enroll"
  )

↓

→

↑

The Final Result!

Code

lm_with_weights <- lm(ncigs ~ enroll,
  data=smoke_df, weights=smoke_df$ipw
)
summary(lm_with_weights) |> broom::tidy() |>
  select(term, estimate, std.error)

term	estimate	std.error
(Intercept)	18.10133	0.2466712
enroll	-5.66453	0.3571696

Code

library(WeightIt)
W <- weightit(
  enroll ~ motiv, data = smoke_df, ps="pred"
)
smoke_weighted_lm <- lm_weightit(
  ncigs ~ enroll, data = smoke_df, weightit = W
)
summary(smoke_weighted_lm, ci = FALSE)


Call:
lm_weightit(formula = ncigs ~ enroll, data = smoke_df, weightit = W)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  18.1013     0.2479   73.02   <1e-06 ***
enroll       -5.6645     0.5463  -10.37   <1e-06 ***
Standard error: HC0 robust

Code

W_default <- weightit(enroll ~ motiv, data = smoke_df)
smoke_default_lm <- lm_weightit(
  ncigs ~ enroll, data = smoke_df,
  weightit = W_default
)
summary(smoke_default_lm, ci = FALSE)


Call:
lm_weightit(formula = ncigs ~ enroll, data = smoke_df, weightit = W_default)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  18.1013     0.2441   74.16   <1e-06 ***
enroll       -5.6645     0.5444  -10.41   <1e-06 ***
Standard error: HC0 robust (adjusted for estimation of weights)

…What If We Have Many Covariates?

Curse of dimensionality…
Lab Time!

References

Athey, Susan, and Stefan Wager. 2019. “Estimating Treatment Effects with Causal Forests: An Application.” arXiv. https://doi.org/10.48550/arXiv.1902.07409.

Rosenbaum, Paul R., and Donald B. Rubin. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70 (1): 41–55. https://doi.org/10.2307/2335942.

Footnotes

For reasons we’ll see later (double-robustness), you should ultimately try to achieve both goals!↩︎