Week 12: Tools for Final Projects
PPOL 6805 / DSAN 6750: GIS for Spatial Data Science
Fall 2025
Real MF Project Hours!!!
- The “MF” is for “Map Fanatic”!
Final Project Details
- Project Showcase: 6:30-9pm, Wed, December 3, 2025
- Come eat falafel and show off your cool visualizations and regression coefficients!
- Reports: Due 5:59pm, Fri, December 12, 2025
- If you’re in DSAN 6000 and you’re worried about that final project… See next slide!
Due Date Details
Presentation Setup
- Each of your desks becomes a “table” at a GIS conference!
- Everyone can go around and ask others about projects 😻
- But, FOOD \(\implies\) you can also stay!
Report Setup
- GitHub repository (for your portfolio!)
- GH Pages site:
your_username.github.io/gis-project
- GH Pages site:
- …What do you put in that GitHub repository?
- Quarto Manuscript
- What do you put in the Quarto manuscript?
- Writeup + Visualizations + Code, interspersed!
- “Literate Programming” \(\Rightarrow\) Reproducible Results!
Spatial Regression
- (We made it! This is the last Spatial Data Science topic!)
- (Nearly all extremely-fancy applied GIS models can be implemented via Spatial Regression!)
Motivation: When Does Non-Spatial Regression “Work”?
\[ Y_i = \beta_0 + \beta_1X_{i,1} + \beta_2X_{i,2} + \cdots + \beta_MX_{i,M} + \varepsilon_i \]
Importance of OLS regression: can give us the Best Linear Unbiased Estimator (BLUE)
This is only true if the Gauss-Markov assumptions hold—one of these is that the error terms are uncorrelated:
\[ \text{Cov}[\varepsilon_i, \varepsilon_j] = 0 \; \forall i \neq j \]
Spatial Autocorrelation in Residuals
- We have now seen several models / datasets where effect of some variable \(X\) (say, population) on another variable \(Y\) (say, disease count) is spatial! (Kind of the whole point of the class 😜)
- So, to see when OLS will “work”, vs. when you need to incorporate GIS, key step is plotting the spatial distribution of regression residuals!
Example: Italian Elections
Will OLS “Work” Here?
- Can we use OLS to derive the BLUE of the effect of GDP on voting?
| \(N = 477\) | \(\widehat{\beta}\) | SE | \(t\) |
|---|---|---|---|
| Intercept | 35.30 | 2.21 | 15.96 |
| Log GDP per cap | 13.46 | 0.65 | 20.84 |
Moran’s \(I\) for residuals = 0.47(!)
| \(N = 477\) | \(\widehat{\beta}\) | SE | \(t\) |
|---|---|---|---|
| Intercept | 4.70 | 1.66 | 2.80 |
| Log GDP per cap | 1.77 | 0.48 | 3.66 |
| \(\rho\) | 0.87 | 0.02 | 36.7 |
What Does it Mean that “Spatial Effect” is Significant?

Case 1: No Residual Spatial Autocorrelation
- Example: Residuals have Moran’s \(I\) near 0…
- …Don’t need GIS at all!
Case 2: Conditional Autoregressive Model (CAR)
- GIS, but… only for “fixing” your regression
\[ Y_i = \underbrace{\mu_i}_{\text{Non-spatial model}} + \underbrace{\frac{1}{w_{i,\cdot}}\sum_{j \neq i}(Y_j - \mu_j)}_{\text{Spatial Autocorrelation}} + \varepsilon_i \]
Case 3: Simultaneous Autoregressive Model (SAR)
- The main event! Here we are explicitly modeling “spatially lagged” versions of our dependent variable \(Y\)!
\[ Y = \mathbf{X}\beta + \rho \underbrace{\mathbf{W}Y}_{\mathclap{\text{Spatially-lagged }Y}} + \boldsymbol\varepsilon \]
Immediately-Relevant Tools for Final Projects! (The Coming Weeks)
- Research Methodology (Hypotheses)
- Visualizing Spatial Data
- Weighted Connection Matrices \(\mathbf{W}\)
- Remote-Sensed / Raster Data
- Data Anonymization / Synthetic Datasets
Research Methodology: Hypotheses
Social Science (McCourt):
Machine Learning (DSAN):
(Secretly My Opportunity to do More Spatial Regression!)
- What explains previously-observed instances of separatist insurgencies?
- Can we predict separatist insurgencies?
- One (spatial) idea: how far away are [centers of power] from [regions of countervailing power]?
- \(X\) = Distance from capital, \(Y\) = Insurgency
- Unit of observation: Regions? Insurgent uprisings? Countries?
Operationalizing
- The variables in previous slide are conceptual
- Operationalizing = “Turning into measurable quantities”
- \(X\) = MeanDistance(Capital, Insurgent Region), \(Y = \mathbf{1}[\text{Insurgency}]\)
- Alternative:
\[ Y = \begin{cases} 2 &\text{if Successful Insurgency} \\ 1 &\text{if Failed Insurgency} \\ 0 &\text{if No Insurgency} \end{cases} \]
Connection Matrices
spdep- Neighbors if Centroids are close
- Neighbors if Capitals are close
- 🤔
Remote-Sensed / Raster Data
- You’ve seen (to some extent)
terra stars: Same group behindsf- Google solar panel data
.tiffiles: Dynamically loaded
Anonymization / Synthetic Datasets
- Differential privacy
- Used by the US Census(!) Since 2020
Example: Bolshevik Revolution \(\rightarrow\) “Bipolar” World

- Null hypothesis: no spatial effect of democratization/de-democratization on neighboring countries
- If null hypothesis is true, and countries democratize/de-democratize independently… could this pattern still arise?