Week 10: Evaluating Spatial Hypotheses II: Areal Data

PPOL 6805 / DSAN 6750: GIS for Spatial Data Science
Fall 2024

Class Sessions

Author

Affiliation

Jeff Jacobs

jj1088@georgetown.edu

Published

Wednesday, October 30, 2024

Open slides in new tab →

Roadmap to the Midterm!

Last Week (Oct 23): Evaluating Hypotheses for Point Data
Today (Oct 30): Evaluating Hypotheses for Areal Data
In-Class Midterm (Nov 6): Basically a “mini-homework” on Spatial Data Science unit (topics from HW4 onwards)!

Point Processes $\to$ Areal Processes

Code

source("../dsan-globals/_globals.r")
set.seed(6805)
library(tidyverse) |> suppressPackageStartupMessages()
library(sf) |> suppressPackageStartupMessages()
library(spatstat) |> suppressPackageStartupMessages()
library(mdthemes) |> suppressPackageStartupMessages()
library(mapview) |> suppressPackageStartupMessages()
library(leaflet) |> suppressPackageStartupMessages()

The “Complete” Part of Complete Spatial Randomness (CSR)
Bringing Neighbors Back In
Spatial Regression

Points $\to$ Areas

Aggregating Point Processes

Recall the two “stages” of our Poisson-based point processes
- A Poisson-distributed number of points, then
- Uniformly-distributed coordinates for each point
One way to see areal data modeling: we keep the Poisson part, but we no longer observe the coordinates for the points!
So… why is it still helpful for you to have sat through those lectures on Point Processes? Two reasons…

(1) Areal Patterns = Aggregated Point Patterns

One application, mentioned last week: areal-weighted interpolation using actual models of how the points are distributed within the area!
- Regularly-spaced points is rarely a good “default” model!
- Humans, for example, rarely live at perfect evenly-spaced intervals… they form households, villages, cities
- Regularly-spaced points (we now know) $⟹$ negative autocorrelation $⟹$ typically due to inhibition process (competition)

(2) Spatial Scan Statistics

Areal regions often the result of artificial / “arbitrary” human divisions
- (Particulate matter doesn’t pass through customs)
$⟹$ If we care about processes which don’t “adhere to” borders (like disease spread), we want to “scan” buffers around points regardless of areal borders

(3) Hierarchical Bayesian Smoothing

The issue might just be not enough data for some areas, while we have an abundance of data in nearby areas…

From Kramer (2023), *Spatial Epidemiology*

Autocorrelation on Networks

Reminder: Weight Matrix Defines “Neighbors”

Choices Available in `spdep`

$w_{i j} = 1$ if $i$ and $j$ “overlap”
- Rook: Overlap is 1 or 2-dimensional
- Queen: Overlap is 0, 1, or 2-dimensional
$w_{i j} = \frac{1}{dist (i, j)}$
$w_{i j} = 1$ if $d i s t (i, j) < \overset{―}{D}$
$w_{i j} = 1$ for $K$ nearest neighbors

Moran’s $I$ One More Time

Round 1 (Week 6): Moran’s $I$ as “thermometer”
Round 2 (Today): Could an $I$ value this extreme occur due to random noise?

Example: Bolshevik Revolution $\to$ “Bipolar” World

Null hypothesis: no spatial effect of democratization/de-democratization on neighboring countries
If null hypothesis is true, and countries democratize/de-democratize independently… could this pattern still arise?

Spatial Regression

(We made it! This is the last Spatial Data Science topic!)
(Nearly all extremely-fancy applied GIS models can be implemented via Spatial Regression!)

Motivation: When Does Non-Spatial Regression “Work”?

$Y_{i} = β_{0} + β_{1} X_{i, 1} + β_{2} X_{i, 2} + \dots + β_{M} X_{i, M} + ε_{i}$

Importance of OLS regression: can give us the Best Linear Unbiased Estimator (BLUE)
This is only true if the Gauss-Markov assumptions hold—one of these is that the error terms are uncorrelated:

$Cov [ε_{i}, ε_{j}] = 0 \forall i \neq j$

Spatial Autocorrelation in Residuals

We have now seen several models / datasets where effect of some variable $X$ (say, population) on another variable $Y$ (say, disease count) is spatial! (Kind of the whole point of the class 😜)
So, to see when OLS will “work”, vs. when you need to incorporate GIS, key step is plotting the spatial distribution of regression residuals!

Example: Italian Elections

Will OLS “Work” Here?

Can we use OLS to derive the BLUE of the effect of GDP on voting?

Plain OLS

$N = 477$	$\hat{β}$	SE	$t$
Intercept	35.30	2.21	15.96
Log GDP per cap	13.46	0.65	20.84

Moran’s $I$ for residuals = 0.47(!)

Spatial Regression

$N = 477$	$\hat{β}$	SE	$t$
Intercept	4.70	1.66	2.80
Log GDP per cap	1.77	0.48	3.66
$ρ$	0.87	0.02	36.7

What Does it Mean that “Spatial Effect” is Significant?

Case 1: No Residual Spatial Autocorrelation

Example: Residuals have Moran’s $I$ near 0…
…Don’t need GIS at all!

Case 2: Conditional Autoregressive Model (CAR)

GIS, but… only for “fixing” your regression

$Y_{i} = \underset{Non-spatial model}{\underset{⏟}{μ_{i}}} + \underset{Spatial Autocorrelation}{\underset{⏟}{\frac{1}{w_{i, c d o t}} \sum_{j \neq i} (Y_{j} - μ_{j})}} + ε_{i}$

Case 3: Simultaneous Autoregressive Model (SAR)

The main event! Here we are explicitly modeling “spatially lagged” versions of our dependent variable $Y$ !

$Y = X β + ρ \underset{Spatially-lagged Y}{\underset{⏟}{W Y}} + ε$

References

Ward, Michael D., and Kristian Skrede Gleditsch. 2018. Spatial Regression Models. SAGE Publications.

Roadmap to the Midterm!

Point Processes → Areal Processes

Points → Areas

Aggregating Point Processes

(1) Areal Patterns = Aggregated Point Patterns

(2) Spatial Scan Statistics

(3) Hierarchical Bayesian Smoothing

Autocorrelation on Networks

Reminder: Weight Matrix Defines “Neighbors”

Choices Available in spdep

Moran’s I One More Time

Example: Bolshevik Revolution → “Bipolar” World

Spatial Regression

Motivation: When Does Non-Spatial Regression “Work”?

Spatial Autocorrelation in Residuals

Example: Italian Elections

Will OLS “Work” Here?

What Does it Mean that “Spatial Effect” is Significant?

Case 1: No Residual Spatial Autocorrelation

Case 2: Conditional Autoregressive Model (CAR)

Case 3: Simultaneous Autoregressive Model (SAR)

References

Point Processes $\to$ Areal Processes

Points $\to$ Areas

Choices Available in `spdep`

Moran’s $I$ One More Time

Example: Bolshevik Revolution $\to$ “Bipolar” World