Week 11: Sensitivity Analysis

DSAN 5650: Causal Inference for Computational Social Science
Summer 2025, Georgetown University

Class Sessions

Author

Affiliation

Jeff Jacobs

jj1088@georgetown.edu

Published

Wednesday, July 30, 2025

Open slides in new window →

Final Project Drafts: Friday, 5:59pm EDT

On Canvas!
Think of it as a skeleton of the final project
Final submission = draft + missing pieces filled in

Topic Models Over Space and Time

Topic Models

Intuition is just: let’s model latent topics “underlying” observed words

Section	Keywords
U.S. News	state, court, federal, republican
World News	government, country, officials, minister
Arts	music, show, art, dance
Sports	game, league, team, coach
Real Estate	home, bedrooms, bathrooms, building

Already, by just classifying articles based on these keyword counts:

	Arts	Real Estate	Sports	U.S. News	World News
Correct	3020	690	4860	1330	1730
Incorrect	750	60	370	1100	590
Accuracy	0.801	0.920	0.929	0.547	0.746

Topic Models as PGMs

…Unlocks a world of social modeling through text!

Cross-Sectional Analysis

Blaydes, Grimmer, and McQueen (2018)

Influence Over Time

Textual Influence Over Time

Sensitivity Analysis 1: Sensitivity to Priors

Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
cb_palette = ['#e69f00','#56b4e9','#009e73']
sns.set_palette(cb_palette)
import patchworklib as pw
import pymc as pm
import arviz as az
def draw_prior_sample(model, return_idata=False):
  with model:
    prior_idata = pm.sample_prior_predictive(draws=10_000, random_seed=5650)
  prior_df = prior_idata.prior.to_dataframe().reset_index().drop(columns='chain')
  if return_idata:
    return prior_idata, prior_df
  return prior_df
# Plotting function
def gen_dist_plot(dist_df, plot_title, size='third', custom_ylim=None):
  plot_width = 2
  if size == 'quarter':
    plot_width = 1.5
  plot_height = 0.625 * plot_width
  plot_figsize = (plot_width, plot_height)
  ax = pw.Brick(figsize=plot_figsize)
  sns.histplot(
    x="p_heads", data=dist_df, ax=ax,
    bins=25, alpha=0.8, stat='probability'
  );
  ax.set_xlim(0, 1);
  if custom_ylim is not None:
    ax.set_ylim(custom_ylim)
  ax.set_title(plot_title)
  return ax

<Figure size 96x96 with 0 Axes>

The Beta Distribution \(\text{Beta}(\alpha, \beta)\): The “Workhorse” Prior!

Biased coin framing: \(\alpha\) is “pseudo-count” of heads, \(\beta\) = “pseudo-count” of tails

\(p \sim\) “\(\text{Beta}(0, 0)\)”

Code

# b00_df = pd.DataFrame({'p_heads': np.arange(0, 1+1/25, 1/25)})
# b00_df['Probability'] = 0
ax = pw.Brick(figsize=(2, 1.25))
sns.histplot(
  x=[], ax=ax, # data=b00_df
);
ax.set_xlabel('p_heads');
ax.set_ylabel('Probability')
ax.set_xticks([0, 0.25, 0.5, 0.75, 1]);
ax.set_ylim(0, 1);
ax.savefig()

\(p \sim\) “\(\text{Beta}(1, 0)\)”

Code

b10_df = pd.DataFrame({
  'draw': [0],
  'p_heads': [0],
})
ax = pw.Brick(figsize=(2, 1.25))
sns.histplot(
  x='p_heads', stat='probability', ax=ax,
  bins=25,
  data=b10_df
);
ax.set_xlim(0, 1);
ax.set_ylim(0, 1);
ax.savefig()

\(p \sim\) “\(\text{Beta}(0, 1)\)”

Code

b01_df = pd.DataFrame({
  'draw': [0],
  'p_heads': [1],
})
ax = pw.Brick(figsize=(2, 1.25))
sns.histplot(
  x='p_heads', stat='probability', ax=ax,
  bins=25,
  data=b01_df
);
ax.set_xlim(0, 1);
ax.set_ylim(0, 1);
ax.savefig()

Informative Prior

Code

# Informative Prior
with pm.Model() as informative_model:
  p_heads = pm.Beta("p_heads", alpha=2, beta=2)
  result = pm.Bernoulli("result", p=p_heads)
informative_df = draw_prior_sample(informative_model)
informative_plot = gen_dist_plot(informative_df, "Beta(2, 2) Prior on p");
informative_plot.savefig()

Sampling: [p_heads, result]

Fig 1. *“I’ll start by assuming a **fair** coin”*

Weak Prior

Code

# Weakly Informative
with pm.Model() as weak_model:
  p_heads = pm.Beta("p_heads", alpha=1, beta=1)
  result = pm.Bernoulli("result", p=p_heads)
weak_df = draw_prior_sample(weak_model)
weak_plot = gen_dist_plot(weak_df, "Beta(1, 1) Prior on p");
weak_plot.savefig()

Sampling: [p_heads, result]

Fig 2. *“I have no idea, I’ll assume all biases equally likely”*

Skeptical (Jeffreys) Prior

Code

# Skeptical / Jeffreys
with pm.Model() as skeptical_model:
  p_heads = pm.Beta("p_heads", alpha=0.5, beta=0.5)
  result = pm.Bernoulli("result", p=p_heads)
skeptical_df = draw_prior_sample(skeptical_model)
skeptical_plot = gen_dist_plot(skeptical_df, "Beta(0.5, 0.5) Prior on p");
# And combine
skeptical_plot.savefig()

Sampling: [p_heads, result]

Fig 3. *“I don’t trust this coin, it’ll take lots of flips with H/T balance to convince me it’s fair!”*

Modeling Domain Knowledge with Priors

Population proportion framing: \(\frac{\alpha}{\alpha + \beta}\) = mean, \(\alpha + \beta\) = “precision”

Code

with pm.Model() as heads_model:
  p_heads = pm.Beta("p_heads", alpha=2, beta=1)
  result = pm.Bernoulli("result", p=p_heads)
heads_df = draw_prior_sample(heads_model)
heads_plot = gen_dist_plot(heads_df, "Beta(2, 1) Prior on p");
heads_plot.savefig()

Sampling: [p_heads, result]

Code

with pm.Model() as tails_model:
  p_heads = pm.Beta("p_heads", alpha=1, beta=2)
  result = pm.Bernoulli("result", p=p_heads)
tails_df = draw_prior_sample(tails_model)
tails_plot = gen_dist_plot(tails_df, "Beta(1, 2) Prior on p");
tails_plot.savefig()

Sampling: [p_heads, result]

Code

with pm.Model() as beta23_model:
  p_heads = pm.Beta("p_heads", alpha=2, beta=3)
  result = pm.Bernoulli("result", p=p_heads)
beta23_df = draw_prior_sample(beta23_model)
beta23_plot = gen_dist_plot(beta23_df, "Beta(2, 3) Prior on p");
beta23_plot.savefig()

Sampling: [p_heads, result]

Code

with pm.Model() as beta100_model:
  p_heads = pm.Beta("p_heads", alpha=40, beta=60)
  result = pm.Bernoulli("result", p=p_heads)
beta100_df = draw_prior_sample(beta100_model)
beta100_plot = gen_dist_plot(beta100_df, "Beta(40, 60) Prior on p");
beta100_plot.savefig()

Sampling: [p_heads, result]

Sensitivity Analysis 2: How Sensitive Are My Results to Omitted/Included Variable Bias?

You probably know about omitted variable bias from previous classes / intuition… but, included variable bias?
That’s right folks… colliders can be agents of chaos, laying waste to our best, most meticulously-planned-out models

Does Aging Cause Sadness?

	Each year, 20 people are born with uniformly-distributed happiness values
	Each year, each person ages one year; Happiness does not change
	At age 18, individuals can become married; Odds of marriage each year proportional to individual's happiness; Once married, they remain married
	At age 70 individuals leave the sample (They move to Boca Raton, Florida)

Code

import pandas as pd
import numpy as np
rng = np.random.default_rng(seed=5650)
import matplotlib.pyplot as plt
import seaborn as sns
cb_palette = ['#e69f00','#56b4e9','#009e73']
sns.set_palette(cb_palette)
import patchworklib as pw
from scipy.special import expit

# The original R code:
# sim_happiness <- function( seed=1977 , N_years=1000 , max_age=65 , N_births=20 , aom=18 ) {
#     set.seed(seed)
#     H <- M <- A <- c()
#     for ( t in 1:N_years ) {
#         A <- A + 1 # age existing individuals
#         A <- c( A , rep(1,N_births) ) # newborns
#         H <- c( H , seq(from=-2,to=2,length.out=N_births) ) # sim happiness trait - never changes
#         M <- c( M , rep(0,N_births) ) # not yet married
#         # for each person over 17, chance get married
#         for ( i in 1:length(A) ) {
#             if ( A[i] >= aom & M[i]==0 ) {
#                 M[i] <- rbern(1,inv_logit(H[i]-4))
#             }
#         }
#         # mortality
#         deaths <- which( A > max_age )
#         if ( length(deaths)>0 ) {
#             A <- A[ -deaths ]
#             H <- H[ -deaths ]
#             M <- M[ -deaths ]
#        }
#     }
#     d <- data.frame(age=A,married=M,happiness=H)
#     return(d)

# DGP: happiness -> marriage <- age
years = 70
num_births = 41
colnames = ['age','a','h','m']
sim_dfs = []
A = np.zeros(shape=(num_births,1))
H = np.linspace(-2, 2, num=num_births)
M = np.zeros(shape=(num_births,1))
def update_m(row):
  if row['m'] == 0:
    return int(rng.binomial(
      n=1,
      p=expit(row['h'] - 3.875),
      size=1,
    ))
  return 1
def sim_cohort_to(max_age):
  sim_df = pd.DataFrame({
      'age': [1 for _ in range(num_births)],
      'h': np.linspace(-2, 2, num=num_births),
      'm': [0 for _ in range(num_births)],
    }
  )
  for t in range(2, max_age + 1):
    sim_df['age'] = sim_df['age'] + 1
    if t >= 18:
      sim_df['m'] = sim_df.apply(update_m, axis=1)
  return sim_df
all_sim_dfs = []
for cur_max_age in range(1, 71):
  cur_sim_df = sim_cohort_to(cur_max_age)
  all_sim_dfs.append(cur_sim_df)
full_sim_df = pd.concat(all_sim_dfs)

# full_sim_df.head()
cbg_palette = ['#c6c6c666'] + cb_palette
full_sim_df['m_label'] = full_sim_df['m'].apply(lambda x: "Unmarried" if x == 0 else "Married")
full_sim_df = full_sim_df.rename(columns={'age': 'Age', 'h': 'Happiness'})
ax = pw.Brick(figsize=(5.25,2.75));
sns.scatterplot(
  x='Age', y='Happiness', hue='m_label',
  data=full_sim_df,
  ax=ax,
  palette=cbg_palette,
  s=22,
  legend=True,
);
ax.move_legend("upper center", bbox_to_anchor=(0.5, 1.15), ncol=2);
ax.legend_.set_title("");
ax.axvline(x=17.5, color='black', ls='dashed', lw=1);
ax.savefig()
mean_hap_df = full_sim_df.groupby('Age')['Happiness'].mean().reset_index()
mean_hap_df['m_label'] = "All"
mean_hap_df['Happiness_mean'] = 0
group_hap_df = full_sim_df[full_sim_df['Age'] >= 18].groupby(['Age','m_label'])['Happiness'].mean().reset_index()
married_df = group_hap_df[group_hap_df['m_label'] == "Married"].copy()
unmarried_df = group_hap_df[group_hap_df['m_label'] == "Unmarried"].copy()
# Moving averages
win_size = 3
married_df['Happiness_mean'] = married_df['Happiness'].rolling(window=win_size).mean()
unmarried_df['Happiness_mean'] = unmarried_df['Happiness'].rolling(window=win_size).mean()
# Merge back together
combined_df = pd.concat([mean_hap_df, married_df, unmarried_df], ignore_index=True)
# display(combined_df.tail())

ax = pw.Brick(figsize=(3.75,2.4));
ax.set_xlim(0, 70);
# ax.set_ylim(-1.2, 1.2);
cbg_palette = ['black', cb_palette[0], '#b2b2b2ff']

# And plot
sns.lineplot(
  x='Age', y='Happiness_mean',
  hue='m_label',
  # marker=".",
  # s=20,
  data=combined_df,
  palette=cbg_palette,
  # color=cbg_palette[0],
  # order=3,
  ax=ax
);
ax.set_title("Mean Happiness by Status");
ax.legend_.set_title("");
ax.set_xlabel("Mean Happiness");
ax.savefig()

Fig 4. Happiness by Age, 70 birth cohorts of size 41 each
(DC minimum marriage age = 18)

…What’s happening here?
\(\textsf{Happy} \rightarrow {}^{🤔}\textsf{Marriage}^{🤔} \leftarrow \textsf{Age}\)

References

Barron, Alexander T. J., Jenny Huang, Rebecca L. Spang, and Simon DeDeo. 2018. “Individuals, Institutions, and Innovation in the Debates of the French Revolution.” Proceedings of the National Academy of Sciences 115 (18): 4607–12. https://doi.org/10.1073/pnas.1717729115.

Blaydes, Lisa, Justin Grimmer, and Alison McQueen. 2018. “Mirrors for Princes and Sultans: Advice on the Art of Governance in the Medieval Christian and Islamic Worlds.” The Journal of Politics 80 (4): 1150–67. https://doi.org/10.1086/699246.

Blei, David M. 2012. “Probabilistic Topic Models.” Commun. ACM 55 (4): 77–84. https://doi.org/10.1145/2133806.2133826.