Logistics / Table-Setting

Midterm will be 52.5-Hour Take-Home Exam
Midterm folder will magically appear on guhub.io at 7:30pm EDT on Weds July 1 (immediately after 1hr intro-lecture ends), due 11:59pm EDT on Friday July 3
\(\leadsto\) 1hr intro-lecture recording will be a video resource for you, providing necessary background for midterm questions (like the intro-to-Afrobarometer this week!)

Bayesian Workflow Seed-Planting

(Terms in HW2 we haven’t looked at yet!)

Prior: Distribution over parameter values before seeing data
Prior Predictive: Distribution over outcomes, before seeing data
Posterior: Distribution over parameters, after seeing data
Posterior Predictive: Distribution over outcomes, after seeing data

Adaptive Pooling in Multilevel Models

Pooling: None, Full, and Adaptive

\[ \hat{\alpha}_j^{\text {multilevel }} = \frac{\frac{n_j}{\sigma_y^2} \bar{y}_j+\frac{1}{\sigma_\alpha^2} \bar{y}_{\text {all }}}{\frac{n_j}{\sigma_y^2}+\frac{1}{\sigma_\alpha^2}} \]

Why Adaptive >> None or Full?

\[ \alpha_j \sim \mathcal{N}(\mu_\alpha, \sigma_\alpha), \text{ for }j = 1, \ldots, J \]

In the limit of \(\sigma_\alpha \rightarrow \infty\), the soft constraints do nothing, and there is no pooling;
As \(\sigma_\alpha \rightarrow 0\), they pull the estimates all the way to zero, yielding the complete-pooling estimate

Bayesian Workflow

Prior: Distribution over parameter values before seeing data
Prior Predictive: Distribution over outcomes, before seeing data
Posterior: Distribution over parameters, after seeing data
Posterior Predictive: Distribution over outcomes, after seeing data

Modeling How Trees Become Forests

Your model, if it’s generative (which… nearly all in 5650 are), relates observed data \(\mathbf{D}\) to a set of underlying parameters \(\boldsymbol{\theta}\) that are hypothesized as “giving rise” to \(\mathbf{D}\)
When you specify how exactly this “giving rise” works, you’re specifying a DGP!
PGMs: human-brain-friendly (bc graphical) language for writing DGPs; then move to PyAgrum \(\rightarrow\) Stan \(\rightarrow\) PyMC to “encode” PGMs (in 0100101) so computer can…

Super-charge your EDA/modeling	Estimate \(\boldsymbol{\theta}\) from data
\(\leadsto\) Prior distributions	\(\leadsto\) Posterior distributions

Code

library(tidyverse)
library(ggExtra)
gen_walk_plot <- function(walk_data, a=0.0075) {
  # print(end_df)
  grid_color <- rgb(0, 0, 0, 0.1)
  # And plot!
  walkplot <- ggplot() +
    geom_line(
      data = walk_data$long_df,
      aes(x = t, y = pos, group = pid),
      linewidth = g_linewidth,
      alpha = a,
      #color = cb_palette[2]
      #color = "#cf8f00"
      color = "black"
    ) +
    geom_point(
      data = walk_data$end_df,
      aes(x = t, y = endpos),
      alpha = 0
    ) +
    scale_x_continuous(
      breaks = seq(
        0,
        walk_data$num_steps,
        walk_data$num_steps / 4
      )
    ) +
    scale_y_continuous(
      breaks = seq(-20, 20, 10)
    ) +
    theme_dsan(base_size=24) +
    theme(
      legend.position = "none",
      # title = element_text(size = 16)
    ) +
    theme(
      panel.grid.major.y = element_line(
        color = grid_color,
        linewidth = 1,
        linetype = 1
      )
    ) +
    labs(
      title = paste0(
        walk_data$num_people, " Random Walks, ",
        walk_data$num_steps, " Steps"
      ),
      x = "Number of Steps",
      y = "Position"
    )
}
walk_data <- readRDS("assets/walk_data.rds")
# 16 steps
# wp1 <- gen_walkplot(500, 16, 0.05)
# ggMarginal(wp1, margins = "y", type = "histogram", yparams = list(binwidth = 1))
wp <- gen_walk_plot(walk_data) + ylim(-30,30)
ggMarginal(
  wp, margins = "y",
  type = "histogram",
  yparams = list(binwidth = 1)
)

Prior “Stage” Distributions

RVs \(\boldsymbol\theta\) = params of your model, RVs \(\mathbf{X}\) = data \(\leadsto\) Generative model \(\pedge{\boldsymbol\theta}{\mathbf{X}}\)
Enter Prior World (Before observing any data): Since we have priors over \(\boldsymbol\theta\) \(\leadsto\) we can sample values of \(\boldsymbol\theta\), then generate synthetic data on the basis of these values

Prior Distribution: \(\Pr(\boldsymbol{\theta}^{❓})\)

What can I guess about values of my parameters from background knowledge of the world? e.g.:

Human heights can’t be negative
Data collected at a bar \(\Rightarrow\) Age \(\geq\) 18, \(\Pr(\text{Age} = x)\) decreases as \(x\) goes above 30

Prior Predictive Distribution: \(\Pr(\mathbf{X}^{❓} \mid \boldsymbol{\theta}^{❓})\)

What could the outcomes look like if I ran my guesses through the DGP?

100 simulated heights, none are negative

1K sim bar-goers; 80% have this haircut \(\rightarrow\)

Posterior “Stage” Distributions

RVs \(\boldsymbol\theta\) = params of your model, RVs \(\mathbf{X}\) = data \(\leadsto\) Generative model \(\pedge{\boldsymbol\theta}{\mathbf{X}}\)

Posterior Distribution: \(\Pr\left(\boldsymbol{\theta} \; \middle| \; \mathbf{X} = \ponode{\mathbf{X}}\right)\)

Now we observe data: \(\pnode{\mathbf{X}} \leadsto \ponode{\mathbf{X}}\), which means we can use Bayes’ Rule to infer distribution over \(\boldsymbol{\theta}\): what values are most likely to produce \(\ponode{\mathbf{X}}\)?

60% of observed bar-goers have that haircut \(\Rightarrow\) \(\boldsymbol{\theta}_{\text{post}} \approx \frac{0.6 + 0.8}{2} = 0.7\)
Coin = \(\textsf{H}\) 4/5 times \(\Rightarrow\) \(\boldsymbol{\theta} = p = \frac{0.5 + 0.8}{2} = 0.65\)
Coin = \(\textsf{H}\) 5/5 times \(\Rightarrow\) \(\boldsymbol{\theta} = p = \frac{0.5 + 1.0}{2} = 0.75\)

Posterior Predictive Distribution: \(\Pr(\mathbf{X} \mid \boldsymbol{\theta})\)

Now that we’ve fit \(\Pr(\boldsymbol{\theta})\) to data, can generate as much new data as we want, e.g. to evaluate how well model predicts outcomes for test data

Why Do We Need “Subjective” Priors?

Under frequentism… (tldr) literally no method for dealing with shades of uncertainty
Frequentist assumption: For a given coin, \(\Pr(\textsf{Heads})\) is not a Random Variable! It’s some number, like 0.5 or 0.8. It’s the asymptote of \(\frac{\#[\textsf{Heads}]}{\#[\textsf{Heads}] + \#[\textsf{Tails}]}\) as \(n \rightarrow \infty\)
So, if we flip coin 1 time, and get \(\textsf{Heads}\), no basis for inferring \(\Pr(\textsf{Coin is Fair})\) vs. \(\Pr(\textsf{Coin is Biased})\): Both are undefined. We’ve… “discovered” that \(\Pr(\textsf{Heads}) = 1\)
By using Bayesian inference, we can bring prior knowledge into our studies, which we’ll need to do especially for complex emergent systems: societies, economies, etc.!
In fact, as a guy named Laplace discovered in the nineteenth century, frequentism = Bayesian inference with “flat priors”…

Flat vs. Informative Priors

Code

library(tidyverse)
flat_df <- tibble(x=seq(0, 1, 0.1), y=0)
flat_df |> ggplot(aes(x=x, y=y)) +
  geom_line(
    color=cb_palette[1],
    linewidth=g_linewidth
  ) +
  ylim(0, 1) +
  labs(
    title="Flat Prior on Pr(Heads)",
    y="Density"
  ) +
  theme_dsan(base_size=28) +
  theme(title=element_text(size=20))
library(tidyverse)
data_df <- tibble(x=1, y=1)
data_df |> ggplot(aes(x=x, y=y)) +
  geom_point(size=5) +
  geom_segment(
    x=1, y=0, yend=1, linewidth=g_linewidth
  ) +
  xlim(0, 1) +
  ylim(0, 1) +
  labs(
    title="Observed Data",
    y="Density"
  ) +
  theme_dsan(base_size=28) +
  theme(title=element_text(size=20))
library(tidyverse)
library(latex2exp)
w_label <- TeX("Width = $1/n$")
h_label <- TeX("Height = $n$")
data_df <- tibble(x=1, y=1)
data_df |> ggplot(aes(x=x, y=y)) +
  geom_segment(
    x=1, y=0, yend=1, linewidth=g_linewidth,
    color=cb_palette[1], arrow=arrow()
  ) +
  geom_segment(
    x=0, y=0, xend=1, linewidth=g_linewidth,
    color=cb_palette[1]
  ) +
  xlim(0, 1) +
  ylim(0, 1) +
  labs(
    title="Posterior of Pr(Heads)",
    y = "Density"
  ) +
  theme_dsan(base_size=28) +
  theme(title=element_text(size=20)) +
  annotate(
    geom = "text", x = 0.5, y = 0.8, 
    label = w_label, hjust = 0, vjust = 1, size = 8
  ) +
  annotate(
    geom = "text", x = 0.5, y = 0.7, 
    label = h_label, hjust = 0, vjust = 1, size = 8
  )

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ lubridate 1.9.5     ✔ tibble    3.3.1
✔ purrr     1.2.1     ✔ tidyr     1.3.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

\[ \otimes \]

\[ \leadsto \]

Code

library(tidyverse)
unif_df <- tibble(x=seq(0, 1, 0.1), y=1)
unif_df |> ggplot(aes(x=x, y=y)) +
  geom_line(
    color="#e69f00", linewidth=g_linewidth
  ) +
  annotate('rect', xmin=0, xmax=1, ymin=0, ymax=1, fill='#e69f00', alpha=0.3) +
  xlim(0, 1) + ylim(0, 1) +
  labs(title="Uniform Prior on Pr(Heads)") +
  theme_dsan(base_size=28) +
  theme(title=element_text(size=20))
library(tidyverse)
data_df <- tibble(x=1, y=1)
data_df |> ggplot(aes(x=x, y=y)) +
  geom_point(size=5) +
  geom_segment(
    x=1, y=0, yend=1, linewidth=g_linewidth
  ) +
  xlim(0, 1) +
  ylim(0, 1) +
  labs(title="Observed Data") +
  theme_dsan(base_size=28) +
  theme(title=element_text(size=20))
library(tidyverse)
data_df <- tibble(x=1, y=1)
x_vals <- seq(0, 1, 0.01)
my_exp <- function(x) exp(1-1/(x^2))
y_vals <- sapply(x_vals, my_exp)
data_df <- tibble(x=x_vals, y=y_vals)
rib_df <- tibble(x=x_vals, ymax=y_vals, ymin=0)
ggplot() +
  # stat_function(fun=my_exp, linewidth=g_linewidth, color=cb_palette[1]) +
  geom_line(
    data=data_df,
    aes(x=x, y=y),
    linewidth=g_linewidth, color=cb_palette[1]
  ) +
  geom_ribbon(
    data=rib_df,
    aes(x=x, ymin=ymin, ymax=ymax),
    fill=cb_palette[1], alpha=0.3
  ) +
  # geom_segment(
  #   x=1, y=0, yend=1, linewidth=g_linewidth,
  #   color=cb_palette[1], arrow=arrow()
  # ) +
  # geom_segment(
  #   x=0, y=0, xend=1, linewidth=g_linewidth,
  #   color=cb_palette[1]
  # ) +
  xlim(0, 1) +
  ylim(0, 1) +
  labs(title="Posterior of Pr(Heads)") +
  theme_dsan(base_size=28) +
  theme(title=element_text(size=20))

\[ \otimes \]

\[ \leadsto \]

Demos!

HW2B Part 4: Reducing Uncertainty \(\leftrightarrow\) “Sharpening” Knowledge of the State of the World
Lab 1: Skeptical, Weakly-Informative, and Strongly-Informative Priors
Lab 2: Weakly and Strongly-Informative NBA Priors

References

Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. https://www.dropbox.com/scl/fi/asbumi3g0gqa4xl9va7wp/Andrew-Gelman-Jennifer-Hill-Data-Analysis-Using-Regression-and-Multilevel_Hierarchical-Models.pdf?rlkey=zf8icjhm7rswvxrpm7d10m65o&dl=1.