Week 12: Causal Forests for Heterogeneous Treatment Effects

DSAN 5650: Causal Inference for Computational Social Science
Summer 2025, Georgetown University

Class Sessions
Author
Affiliation

Jeff Jacobs

Published

Wednesday, August 6, 2025

Open slides in new window →

Schedule

Today’s Planned Schedule:

Start End Topic
Lecture 6:30pm 6:45pm Final Projects →
6:45pm 7:10pm Included Variable Bias →
7:10pm 8:00pm Heterogeneous Treatment Effects →
Break! 8:00pm 8:10pm
8:10pm 9:00pm Causal Forests →

Final Project / End of Term Things

  • Midterm grading will be done TONIGHT
  • Submit button for HW3 / HW4 TONIGHT

Sensitivity Analysis

  • So you’ve got an estimate of a causal effect! What now? Two possible next steps…
  • Sensitivity Analysis: Check robustness of your results
    • Modeling choices → one point in param space (garden of forking paths)
    • If results “truly” hold, shouldn’t disappear with small changes in parameters/choices
    • Last week: Can re-estimate with informativeweakskeptical priors
    • This week: Simulate how “strong” omissions would have to be to “ruin” finding
  • Heterogeneous Treatment Effects: How does the effect vary btwn subgroups?
    • Example today: PROGRESA (Cash transfer poverty-alleviation program in Mexico)
    • Program has an overall causal effect, but also a greater causal effect on indigenous households, relative to non-indigenous
    • How can we find group-specific effects? Causal forests!

Sensitivity Analysis 2: How Sensitive Are My Results to Omitted/Included Variable Bias?

  • You probably know about omitted variable bias from previous classes / intuition… but, included variable bias?
  • That’s right folks… colliders can be agents of chaos, laying waste to our best, most meticulously-planned-out models

Does Aging Cause Sadness?

Each year, 20 people are born with uniformly-distributed happiness values
Each year, each person ages one year; Happiness does not change
At age 18, individuals can become married; Odds of marriage each year proportional to individual's happiness; Once married, they remain married
At age 70 individuals leave the sample (They move to Boca Raton, Florida)
Code
import pandas as pd
import numpy as np
rng = np.random.default_rng(seed=5650)
import matplotlib.pyplot as plt
import seaborn as sns
cb_palette = ['#e69f00','#56b4e9','#009e73']
sns.set_palette(cb_palette)
import patchworklib as pw
from scipy.special import expit

# The original R code:
# sim_happiness <- function( seed=1977 , N_years=1000 , max_age=65 , N_births=20 , aom=18 ) {
#     set.seed(seed)
#     H <- M <- A <- c()
#     for ( t in 1:N_years ) {
#         A <- A + 1 # age existing individuals
#         A <- c( A , rep(1,N_births) ) # newborns
#         H <- c( H , seq(from=-2,to=2,length.out=N_births) ) # sim happiness trait - never changes
#         M <- c( M , rep(0,N_births) ) # not yet married
#         # for each person over 17, chance get married
#         for ( i in 1:length(A) ) {
#             if ( A[i] >= aom & M[i]==0 ) {
#                 M[i] <- rbern(1,inv_logit(H[i]-4))
#             }
#         }
#         # mortality
#         deaths <- which( A > max_age )
#         if ( length(deaths)>0 ) {
#             A <- A[ -deaths ]
#             H <- H[ -deaths ]
#             M <- M[ -deaths ]
#        }
#     }
#     d <- data.frame(age=A,married=M,happiness=H)
#     return(d)

# DGP: happiness -> marriage <- age
years = 70
num_births = 41
colnames = ['age','a','h','m']
sim_dfs = []
A = np.zeros(shape=(num_births,1))
H = np.linspace(-2, 2, num=num_births)
M = np.zeros(shape=(num_births,1))
def update_m(row):
  if row['m'] == 0:
    return int(rng.binomial(
      n=1,
      p=expit(row['h'] - 3.875),
      size=1,
    ))
  return 1
def sim_cohort_to(max_age):
  sim_df = pd.DataFrame({
      'age': [1 for _ in range(num_births)],
      'h': np.linspace(-2, 2, num=num_births),
      'm': [0 for _ in range(num_births)],
    }
  )
  for t in range(2, max_age + 1):
    sim_df['age'] = sim_df['age'] + 1
    if t >= 18:
      sim_df['m'] = sim_df.apply(update_m, axis=1)
  return sim_df
all_sim_dfs = []
for cur_max_age in range(1, 71):
  cur_sim_df = sim_cohort_to(cur_max_age)
  all_sim_dfs.append(cur_sim_df)
full_sim_df = pd.concat(all_sim_dfs)

# full_sim_df.head()
cbg_palette = ['#c6c6c666'] + cb_palette
full_sim_df['m_label'] = full_sim_df['m'].apply(lambda x: "Unmarried" if x == 0 else "Married")
full_sim_df = full_sim_df.rename(columns={'age': 'Age', 'h': 'Happiness'})
ax = pw.Brick(figsize=(5.25,2.75));
sns.scatterplot(
  x='Age', y='Happiness', hue='m_label',
  data=full_sim_df,
  ax=ax,
  palette=cbg_palette,
  s=22,
  legend=True,
);
ax.move_legend("upper center", bbox_to_anchor=(0.5, 1.15), ncol=2);
ax.legend_.set_title("");
ax.axvline(x=17.5, color='black', ls='dashed', lw=1);
ax.savefig()
mean_hap_df = full_sim_df.groupby('Age')['Happiness'].mean().reset_index()
mean_hap_df['m_label'] = "All"
mean_hap_df['Happiness_mean'] = 0
group_hap_df = full_sim_df[full_sim_df['Age'] >= 18].groupby(['Age','m_label'])['Happiness'].mean().reset_index()
married_df = group_hap_df[group_hap_df['m_label'] == "Married"].copy()
unmarried_df = group_hap_df[group_hap_df['m_label'] == "Unmarried"].copy()
# Moving averages
win_size = 3
married_df['Happiness_mean'] = married_df['Happiness'].rolling(window=win_size).mean()
unmarried_df['Happiness_mean'] = unmarried_df['Happiness'].rolling(window=win_size).mean()
# Merge back together
combined_df = pd.concat([mean_hap_df, married_df, unmarried_df], ignore_index=True)
# display(combined_df.tail())

ax = pw.Brick(figsize=(3.75,2.4));
ax.set_xlim(0, 70);
# ax.set_ylim(-1.2, 1.2);
cbg_palette = ['black', cb_palette[0], '#b2b2b2ff']

# And plot
sns.lineplot(
  x='Age', y='Happiness_mean',
  hue='m_label',
  # marker=".",
  # s=20,
  data=combined_df,
  palette=cbg_palette,
  # color=cbg_palette[0],
  # order=3,
  ax=ax
);
ax.set_title("Mean Happiness by Status");
ax.legend_.set_title("");
ax.set_xlabel("Mean Happiness");
ax.savefig()
(a) Happiness by Age, 70 birth cohorts of size 41 each
(DC minimum marriage age = 18)
(b)
Fig 1

…What’s happening here?
\(\textsf{Happy} \rightarrow {}^{🤔}\textsf{Marriage}^{🤔} \leftarrow \textsf{Age}\)

Assessing the Impact of Omitted / Included Vars

  • Working example from Cinelli and Hazlett (2020): War in Darfur (2004-2020)
  • What is the causal impact of experiencing violence on support for a peace deal
  • Researcher A models this scenario as:

\[ \textsf{PeaceIndex} = \tau_{\text{res}} \textsf{DirectHarm} + \hat{\beta}_{\text{f},\text{res}}\textsf{Female} + \textsf{Village} \hat{\boldsymbol \beta}_{\text{v},\text{res}} + \mathbf{X} \hat{\boldsymbol \beta}_{\text{res}} + \hat{\varepsilon}_{\text{res}} \]

  • Researcher B instead prefers:

\[ \textsf{PeaceIndex} = \tau \textsf{DirectHarm} + \hat{\beta}_{\text{f}}\textsf{Female} + \textsf{Village} \hat{\boldsymbol \beta}_{\text{v}} + \mathbf{X} \hat{\boldsymbol \beta} + \boxed{\hat{\gamma}\textsf{Center}} + \hat{\varepsilon}_{\text{full}} \]

Our earlier estimate \(\tau_{\text{res}}\) would differ from our target quantity \(\tau\): but how badly? […] How strong would the confounder(s) have to be to change the estimates in such a way to affect the main conclusions of a study?

The Classical OVB Equation

  • Remember that \(D\) is our treatment variable! (\(\mathbf{X}\) = covariates, \(Y\) = outcome)
  • In a perfect world, we would estimate \(Y = \hat{\tau}D + \mathbf{X} \hat{\beta} + \hat{\gamma}Z+ \varepsilon_{\text{full}}\)
  • But, \(Z\) is unobserved \(\leadsto\) “restricted” model \(Y = \hat{\tau}_{\text{res}}D + \mathbf{X} \hat{\beta}_{\text{res}} + \varepsilon_{\text{res}}\)
  • What is the “gap” between \(\hat{\tau}\) and \(\hat{\tau}^{\text{res}}\)? \(\text{OVB} = \hat{\tau}_{\text{res}} - \hat{\tau}\)

\[ \begin{align*} \hat{\tau}_{\text{res}} &= \frac{ \text{Cov}[D^{\top \mathbf{X}}, Y^{\top \mathbf{X}}] }{ \text{Var}[D^{\top \mathbf{X}}] } \\ &= \frac{ \text{Cov}[D^{\top \mathbf{X}}, \hat{\tau}D^{\top \mathbf{X}} + \hat{\gamma}Z^{\top \mathbf{X}}] }{ \text{Var}[D^{\top \mathbf{X}}] } \\ &= \hat{\tau} + \hat{\gamma}\frac{ \text{Cov}[D^{\top \mathbf{X}}, Z^{\top \mathbf{X}}] }{ \text{Var}[D^{\top \mathbf{X}}] } \\ &= \hat{\tau} + \hat{\gamma}\hat{\delta} \\ \implies \text{OVB} &= \hat{\tau}_{\text{res}} - \hat{\tau} = \overbrace{ \boxed{\hat{\gamma} \times \hat{\delta}} }^{\mathclap{\text{Impact} \, \times \, \text{Imbalance}}} \end{align*} \]

The ability to produce orthogonalized (\(\top \mathbf{X}\)) versions of vars in the model utilizes the Frisch-Waugh-Lovell Theorem

More Generally

  • What if we don’t have a specific omitted variable in mind? We just want to know the expected impact if there were omitted vars… Enter “Partial \(R^2\)

Heterogeneous Treatment Effects

PROGRESA

References

Cinelli, Carlos, and Chad Hazlett. 2020. “Making Sense of Sensitivity: Extending Omitted Variable Bias.” Journal of the Royal Statistical Society Series B: Statistical Methodology 82 (1): 39–67. https://doi.org/10.1111/rssb.12348.