Week 9: Doubly-Robust Estimation and Instrumental Variables

DSAN 5650: Causal Inference for Computational Social Science
Summer 2025, Georgetown University

Jeff Jacobs

jj1088@georgetown.edu

Wednesday, July 16, 2025

Schedule

Today’s Planned Schedule:

Start End Topic
Lecture 6:30pm 6:45pm Final Project Tings →
6:45pm 7:20pm Double Robustness →
7:20pm 8:00pm When Conditioning Won’t Cut It: IVs →
Break! 8:00pm 8:10pm
8:10pm 9:00pm Drawing Causal Inferences from Text Data →

Final Project Tings

  • MVP vs. non-MVP

Double Robustness

  • Propensity Score Weighting seems so much easier than all the hard work of modeling… why can’t we just propensity score all the things and be done with it!?
  • By using doubly-robust estimation methods, you can:
    • Carefully develop a covariate adjustment strategy (then use e.g. regression),
    • Carefully develop a propensity score strategy, and then
    • Be only as wrong as the least-wrong of and !!

With doubly-robust estimation, as long as the answer is either True or False you’re good!

Doubly-Robust Estimation

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import patchworklib as pw;

import statsmodels.formula.api as smf
from causalml.match import create_table_one
from joblib import Parallel, delayed
from sklearn.linear_model import LogisticRegression

def generate_data(N=300, seed=1):
  np.random.seed(seed)
  
  # Control variables
  male = np.random.binomial(1, 0.45, N)
  age = np.rint(18 + np.random.beta(2, 2, N)*50)
  hours = np.minimum(np.round(np.random.lognormal(5, 1.3, N), 1), 2000)
  
  # Treatment
  pr = np.maximum(0, np.minimum(1, 0.8 + 0.3*male - np.sqrt(age-18)/10))
  dark_mode = np.random.binomial(1, pr, N)==1
  
  # Outcome
  read_time = np.round(np.random.normal(10 - 4*male + 2*np.log(hours) + 2*dark_mode, 4, N), 1)

  # Generate the dataframe
  df = pd.DataFrame({'read_time': read_time, 'dark_mode': dark_mode, 'male': male, 'age': age, 'hours': hours})

  return df

user_df = generate_data(N=300)
ols_model = smf.ols("read_time ~ dark_mode", data=user_df).fit()
ols_summary = ols_model.summary()
results_as_html = ols_summary.tables[1].as_html()
ols_summary_df = pd.read_html(results_as_html, header=0, index_col=0)[0]
ols_summary_df[['coef','std err']]
#type(ols_summary.tables[1])[['coef','std err']]
# .tables[1]
# ols_df[['coef','std err']]

Super cool example courtesy of Matteo Courthoud!

coef std err
Intercept 19.1748 0.402
dark_mode[T.True] -0.4446 0.571
<Figure size 96x96 with 0 Axes>
  • …So, is this a causal effect? Does dark theme cause users to spend less time reading?

Unit of Observation: (Article, Reader)

(…since I couldn’t figure out how to fit it on the last slide)

Code
user_df.head()
read_time dark_mode male age hours
0 14.4 False 0 43.0 65.6
1 15.4 False 1 55.0 125.4
2 20.9 True 0 23.0 642.6
3 20.0 False 0 41.0 129.1
4 21.5 True 0 29.0 190.2

What Does the Data Look Like?

Code
#ax = pw.Brick(figsize=(8,5))
sns.pairplot(
  data=user_df, aspect=0.8
)

Balance

Enter Uber’s causal inference library: causalml

Code
from IPython.display import display, HTML
X = ['male', 'age', 'hours']
table1 = create_table_one(user_df, 'dark_mode', X)
user_df.to_csv("assets/user_df.csv")
table1.to_csv("assets/table1.csv")
HTML(table1.to_html())
Control Treatment SMD
Variable
n 151 149
age 46.01 (9.79) 39.09 (11.53) -0.6469
hours 337.78 (464.00) 328.57 (442.12) -0.0203
male 0.34 (0.47) 0.66 (0.48) 0.6732

And then WeightIt to generate a “love plot”:

Augmented Inverse Propensity Weighting (AIPW)

\[ \begin{align*} \hat \tau_{AIPW} &= \frac{1}{n} \sum_{i=1}^{n} \left( \text{RegEst}(X_i) + \text{PropensityAdj}(X_i, Y_i) \right) \\ \text{RegEst}(X_i) &= \hat \mu^{(1)}(X_i) - \hat \mu^{(0)}(X_i) \\ \text{PropensityAdj}(X_i, Y_i) &= \frac{D_i}{\hat{\mathtt{e}}(X_i)} \left( Y_i - \hat \mu^{(1)}(X_i) \right) - \frac{(1-D_i) }{1-\hat{\mathtt{e}}(X_i)} \left( Y_i - \hat \mu^{(0)}(X_i) \right) \end{align*} \]

where \(\mu^{(d)}(x)\) is the response function, i.e. the expected value of the outcome, conditional on observable characteristics \(x\) and treatment status \(d\), and \(e(X)\) is the propensity score.

\[ \begin{align*} \mu^{(d)}(x) &= \mathbb E \left[ Y_i \ \big | \ X_i = x, D_i = d \right] \\ \mathtt{e}(x) &= \mathbb E \left[ D_i = 1 \ \big | \ X_i = x \right] \end{align*} \]

Model 1: Propensity Score

Code
def estimate_e(df, X, D, model_e):
    e = model_e.fit(df[X], df[D]).predict_proba(df[X])[:,1]
    return e
user_df['e'] = estimate_e(user_df, X, "dark_mode", LogisticRegression())
ax = pw.Brick(figsize=(7, 2.75));
sns.kdeplot(
  x='e', hue='dark_mode', data=user_df,
  # bins=30,
  #stat='density',
  common_norm=False,
  fill=True,
  ax=ax
);
ax.set_xlabel("$e(X)$");
ax.savefig()

Code
w = 1 / (user_df['e'] * user_df["dark_mode"] + (1-user_df['e']) * (1-user_df["dark_mode"]))
smf.wls("read_time ~ dark_mode", weights=w, data=user_df).fit().summary().tables[1]
coef std err t P>|t| [0.025 0.975]
Intercept 18.5871 0.412 45.158 0.000 17.777 19.397
dark_mode[T.True] 1.0486 0.581 1.805 0.072 -0.095 2.192

Model 2: Regression with Controls

  • First, with scikit-learn:
Code
def estimate_mu(df, X, D, y, model_mu):
    mu = model_mu.fit(df[X + [D]], df[y])
    mu0 = mu.predict(df[X + [D]].assign(dark_mode=0))
    mu1 = mu.predict(df[X + [D]].assign(dark_mode=1))
    return mu0, mu1
from sklearn.linear_model import LinearRegression

mu0, mu1 = estimate_mu(user_df, X, "dark_mode", "read_time", LinearRegression())
print(np.mean(mu0), np.mean(mu1))
print(np.mean(mu1-mu0))
18.265714409803312 19.651524322951012
1.3858099131476969
  • Enter EconML, Microsoft’s “Official” ML-based econometrics library 😎
Code
from econml.dr import LinearDRLearner

model = LinearDRLearner(
  model_propensity=LogisticRegression(),
  model_regression=LinearRegression(),
  random_state=5650
)
model.fit(Y=user_df["read_time"], T=user_df["dark_mode"], X=user_df[X]);
model.ate_inference(X=user_df[X].values, T0=0, T1=1).summary().tables[0]
Uncertainty of Mean Point Estimate
mean_point stderr_mean zstat pvalue ci_mean_lower ci_mean_upper
1.321 0.549 2.409 0.016 0.246 2.396

Double-Robustness to the Rescue!

Wrong regression model:

Code
def compare_estimators(X_e, X_mu, D, y, seed):
    df = generate_data(seed=seed)
    e = estimate_e(df, X_e, D, LogisticRegression())
    mu0, mu1 = estimate_mu(df, X_mu, D, y, LinearRegression())
    slearn = mu1 - mu0
    ipw = (df[D] / e - (1-df[D]) / (1-e)) * df[y]
    aipw = slearn + df[D] / e * (df[y] - mu1) - (1-df[D]) / (1-e) * (df[y] - mu0)
    return np.mean((slearn, ipw, aipw), axis=1)

def simulate_estimators(X_e, X_mu, D, y):
    r = Parallel(n_jobs=8)(delayed(compare_estimators)(X_e, X_mu, D, y, i) for i in range(100))
    df_tau = pd.DataFrame(r, columns=['S-learn', 'IPW', 'AIPW'])
    return df_tau
# The actual plots
ax = pw.Brick(figsize=(4, 3.5))
wrong_reg_df = simulate_estimators(
  X_e=['male', 'age'], X_mu=['hours'], D="dark_mode", y="read_time"
)
wrong_reg_plot = sns.boxplot(
  data=pd.melt(wrong_reg_df), x='variable', y='value', hue='variable',
  ax=ax,
  linewidth=2
);
wrong_reg_plot.set(
  title="Distribution of $\hat τ$", xlabel='', ylabel=''
);
ax.axhline(2, c='r', ls=':');
ax.savefig()

Wrong propensity score model:

Code
ax = pw.Brick(figsize=(4, 3.5))
wrong_ps_df = simulate_estimators(
  ['age'], ['male', 'hours'], D="dark_mode", y="read_time"
)
wrong_ps_plot = sns.boxplot(
  data=pd.melt(wrong_ps_df), x='variable', y='value', hue='variable',
  ax=ax,
  linewidth=2
);
ax.set_title("Distribution of $\hat τ$");
ax.axhline(2, c='r', ls=':');
ax.savefig()

Hard Mode: When Conditioning Isn’t an Option / Can’t Fix Things

  • Approaches we’ve discussed thus far depend on the ability to adjust for (e.g., by conditioning-on and/or purposefully-not-conditioning-on) confounders themselves (e.g., in regression), or on propensity scores
  • Enter Instrumental Variables! (this week), Specification of Mechanisms / Front-Door Pathways (next week)
  • \(\leadsto\) Find “natural experiments”, randomizations or pseudo-randomizations in society that allow us to replicate the “holy grail” of random assignment

Instrumental Variable Estimation

Lab: Causal Inference with Text

  • Lab Time!

References