Week 9: Doubly-Robust Estimation and Instrumental Variables

DSAN 5650: Causal Inference for Computational Social Science
Summer 2025, Georgetown University

Class Sessions
Author
Affiliation

Jeff Jacobs

Published

Wednesday, July 16, 2025

Open slides in new window →

Schedule

Today’s Planned Schedule:

Start End Topic
Lecture 6:30pm 6:45pm Final Project Tings →
6:45pm 7:20pm Double Robustness →
7:20pm 8:00pm When Conditioning Won’t Cut It: IVs →
Break! 8:00pm 8:10pm
8:10pm 9:00pm Drawing Causal Inferences from Text Data →

Final Project Tings

  • MVP vs. non-MVP

Double Robustness

  • Propensity Score Weighting seems so much easier than all the hard work of modeling… why can’t we just propensity score all the things and be done with it!?
  • By using doubly-robust estimation methods, you can:
    • Carefully develop a covariate adjustment strategy (then use e.g. regression),
    • Carefully develop a propensity score strategy, and then
    • Be only as wrong as the least-wrong of and !!

With doubly-robust estimation, as long as the answer is either True or False you’re good!

Doubly-Robust Estimation

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import patchworklib as pw;

import statsmodels.formula.api as smf
from causalml.match import create_table_one
from joblib import Parallel, delayed
from sklearn.linear_model import LogisticRegression

def generate_data(N=300, seed=1):
  np.random.seed(seed)
  
  # Control variables
  male = np.random.binomial(1, 0.45, N)
  age = np.rint(18 + np.random.beta(2, 2, N)*50)
  hours = np.minimum(np.round(np.random.lognormal(5, 1.3, N), 1), 2000)
  
  # Treatment
  pr = np.maximum(0, np.minimum(1, 0.8 + 0.3*male - np.sqrt(age-18)/10))
  dark_mode = np.random.binomial(1, pr, N)==1
  
  # Outcome
  read_time = np.round(np.random.normal(10 - 4*male + 2*np.log(hours) + 2*dark_mode, 4, N), 1)

  # Generate the dataframe
  df = pd.DataFrame({'read_time': read_time, 'dark_mode': dark_mode, 'male': male, 'age': age, 'hours': hours})

  return df

user_df = generate_data(N=300)
ols_model = smf.ols("read_time ~ dark_mode", data=user_df).fit()
ols_summary = ols_model.summary()
results_as_html = ols_summary.tables[1].as_html()
ols_summary_df = pd.read_html(results_as_html, header=0, index_col=0)[0]
ols_summary_df[['coef','std err']]
#type(ols_summary.tables[1])[['coef','std err']]
# .tables[1]
# ols_df[['coef','std err']]

Super cool example courtesy of Matteo Courthoud!

coef std err
Intercept 19.1748 0.402
dark_mode[T.True] -0.4446 0.571
<Figure size 96x96 with 0 Axes>
  • …So, is this a causal effect? Does dark theme cause users to spend less time reading?

Unit of Observation: (Article, Reader)

(…since I couldn’t figure out how to fit it on the last slide)

Code
user_df.head()
read_time dark_mode male age hours
0 14.4 False 0 43.0 65.6
1 15.4 False 1 55.0 125.4
2 20.9 True 0 23.0 642.6
3 20.0 False 0 41.0 129.1
4 21.5 True 0 29.0 190.2

What Does the Data Look Like?

Code
#ax = pw.Brick(figsize=(8,5))
sns.pairplot(
  data=user_df, aspect=0.8
)

Balance

Enter Uber’s causal inference library: causalml

Code
from IPython.display import display, HTML
X = ['male', 'age', 'hours']
table1 = create_table_one(user_df, 'dark_mode', X)
user_df.to_csv("assets/user_df.csv")
table1.to_csv("assets/table1.csv")
HTML(table1.to_html())
Control Treatment SMD
Variable
n 151 149
age 46.01 (9.79) 39.09 (11.53) -0.6469
hours 337.78 (464.00) 328.57 (442.12) -0.0203
male 0.34 (0.47) 0.66 (0.48) 0.6732

And then WeightIt to generate a “love plot”:

Augmented Inverse Propensity Weighting (AIPW)

\[ \begin{align*} \hat \tau_{AIPW} &= \frac{1}{n} \sum_{i=1}^{n} \left( \text{RegEst}(X_i) + \text{PropensityAdj}(X_i, Y_i) \right) \\ \text{RegEst}(X_i) &= \hat \mu^{(1)}(X_i) - \hat \mu^{(0)}(X_i) \\ \text{PropensityAdj}(X_i, Y_i) &= \frac{D_i}{\hat{\mathtt{e}}(X_i)} \left( Y_i - \hat \mu^{(1)}(X_i) \right) - \frac{(1-D_i) }{1-\hat{\mathtt{e}}(X_i)} \left( Y_i - \hat \mu^{(0)}(X_i) \right) \end{align*} \]

where \(\mu^{(d)}(x)\) is the response function, i.e. the expected value of the outcome, conditional on observable characteristics \(x\) and treatment status \(d\), and \(e(X)\) is the propensity score.

\[ \begin{align*} \mu^{(d)}(x) &= \mathbb E \left[ Y_i \ \big | \ X_i = x, D_i = d \right] \\ \mathtt{e}(x) &= \mathbb E \left[ D_i = 1 \ \big | \ X_i = x \right] \end{align*} \]

Model 1: Propensity Score

Code
def estimate_e(df, X, D, model_e):
    e = model_e.fit(df[X], df[D]).predict_proba(df[X])[:,1]
    return e
user_df['e'] = estimate_e(user_df, X, "dark_mode", LogisticRegression())
ax = pw.Brick(figsize=(7, 2.75));
sns.kdeplot(
  x='e', hue='dark_mode', data=user_df,
  # bins=30,
  #stat='density',
  common_norm=False,
  fill=True,
  ax=ax
);
ax.set_xlabel("$e(X)$");
ax.savefig()

Code
w = 1 / (user_df['e'] * user_df["dark_mode"] + (1-user_df['e']) * (1-user_df["dark_mode"]))
smf.wls("read_time ~ dark_mode", weights=w, data=user_df).fit().summary().tables[1]
coef std err t P>|t| [0.025 0.975]
Intercept 18.5871 0.412 45.158 0.000 17.777 19.397
dark_mode[T.True] 1.0486 0.581 1.805 0.072 -0.095 2.192

Model 2: Regression with Controls

  • First, with scikit-learn:
Code
def estimate_mu(df, X, D, y, model_mu):
    mu = model_mu.fit(df[X + [D]], df[y])
    mu0 = mu.predict(df[X + [D]].assign(dark_mode=0))
    mu1 = mu.predict(df[X + [D]].assign(dark_mode=1))
    return mu0, mu1
from sklearn.linear_model import LinearRegression

mu0, mu1 = estimate_mu(user_df, X, "dark_mode", "read_time", LinearRegression())
print(np.mean(mu0), np.mean(mu1))
print(np.mean(mu1-mu0))
18.265714409803312 19.651524322951012
1.3858099131476969
  • Enter EconML, Microsoft’s “Official” ML-based econometrics library 😎
Code
from econml.dr import LinearDRLearner

model = LinearDRLearner(
  model_propensity=LogisticRegression(),
  model_regression=LinearRegression(),
  random_state=5650
)
model.fit(Y=user_df["read_time"], T=user_df["dark_mode"], X=user_df[X]);
model.ate_inference(X=user_df[X].values, T0=0, T1=1).summary().tables[0]
Uncertainty of Mean Point Estimate
mean_point stderr_mean zstat pvalue ci_mean_lower ci_mean_upper
1.321 0.549 2.409 0.016 0.246 2.396

Double-Robustness to the Rescue!

Wrong regression model:

Code
def compare_estimators(X_e, X_mu, D, y, seed):
    df = generate_data(seed=seed)
    e = estimate_e(df, X_e, D, LogisticRegression())
    mu0, mu1 = estimate_mu(df, X_mu, D, y, LinearRegression())
    slearn = mu1 - mu0
    ipw = (df[D] / e - (1-df[D]) / (1-e)) * df[y]
    aipw = slearn + df[D] / e * (df[y] - mu1) - (1-df[D]) / (1-e) * (df[y] - mu0)
    return np.mean((slearn, ipw, aipw), axis=1)

def simulate_estimators(X_e, X_mu, D, y):
    r = Parallel(n_jobs=8)(delayed(compare_estimators)(X_e, X_mu, D, y, i) for i in range(100))
    df_tau = pd.DataFrame(r, columns=['S-learn', 'IPW', 'AIPW'])
    return df_tau
# The actual plots
ax = pw.Brick(figsize=(4, 3.5))
wrong_reg_df = simulate_estimators(
  X_e=['male', 'age'], X_mu=['hours'], D="dark_mode", y="read_time"
)
wrong_reg_plot = sns.boxplot(
  data=pd.melt(wrong_reg_df), x='variable', y='value', hue='variable',
  ax=ax,
  linewidth=2
);
wrong_reg_plot.set(
  title="Distribution of $\hat τ$", xlabel='', ylabel=''
);
ax.axhline(2, c='r', ls=':');
ax.savefig()

Wrong propensity score model:

Code
ax = pw.Brick(figsize=(4, 3.5))
wrong_ps_df = simulate_estimators(
  ['age'], ['male', 'hours'], D="dark_mode", y="read_time"
)
wrong_ps_plot = sns.boxplot(
  data=pd.melt(wrong_ps_df), x='variable', y='value', hue='variable',
  ax=ax,
  linewidth=2
);
ax.set_title("Distribution of $\hat τ$");
ax.axhline(2, c='r', ls=':');
ax.savefig()

Hard Mode: When Conditioning Isn’t an Option / Can’t Fix Things

  • Approaches we’ve discussed thus far depend on the ability to adjust for (e.g., by conditioning-on and/or purposefully-not-conditioning-on) confounders themselves (e.g., in regression), or on propensity scores
  • Enter Instrumental Variables! (this week), Specification of Mechanisms / Front-Door Pathways (next week)
  • \(\leadsto\) Find “natural experiments”, randomizations or pseudo-randomizations in society that allow us to replicate the “holy grail” of random assignment

I… always ask economists why it isn’t explained like this, and they always say “it’s more complicated than that”, and then they explain the complications and I think I understand them but still think that this is a good way to describe the gist!

Instrumental Variable Estimation

Lab: Causal Inference with Text

  • Lab Time!

References