Week 5: Spatial Joins

PPOL 6805 / DSAN 6750: GIS for Spatial Data Science
Fall 2025

Class Sessions
Author
Affiliation

Jeff Jacobs

Published

Wednesday, September 25, 2024

Open slides in new tab →

Doing Things with DE-9IM (Back to Binary Operations)

From Last Week: Almost a Spatial Join

Code
library(rnaturalearth)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.4     ✔ tibble    3.3.0
✔ purrr     1.0.4     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
Code
library(mapview)
library(leaflet.extras2)
Loading required package: leaflet
Code
africa_sf <- ne_countries(continent = "Africa", scale = 50) |> select(iso_a3, geounit)
africa_map <- mapview(africa_sf, label="geounit", legend=FALSE)
N <- 10
africa_union_sf <- sf::st_union(africa_sf)
sampled_points_sf <- sf::st_sample(africa_union_sf, N) |> sf::st_sf() |> mutate(temp = runif(N, 0, 100))
sampled_points_map <- mapview(sampled_points_sf, label="Random Point", col.regions=cbPalette[1], legend=FALSE)
countries_points_sf <- africa_sf[sampled_points_sf,]
filtered_map <- mapview(countries_points_sf, label="geounit", legend=FALSE) + sampled_points_map
(africa_map + sampled_points_map) | filtered_map

Spatial Filter \(\neq\) Spatial Join

  • The issue: Data attributes of POINTs are not merged into data attributes of POLYGONs
POINT Attributes
Code
st_geometry(sampled_points_sf) <- c("geom")
sampled_points_sf |> head()
geom temp
POINT (39.34379 2.926981) 39.40338
POINT (18.62264 6.513931) 35.18329
POINT (14.08976 -3.716445) 22.12226
POINT (20.2687 -19.02181) 84.48096
POINT (-13.04405 25.79444) 99.64294
POINT (13.1074 14.69842) 34.71885
POLYGON Attributes
Code
countries_points_sf |> head(4)
iso_a3 geounit geometry
1 ZWE Zimbabwe MULTIPOLYGON (((31.28789 -2…
52 SSD South Sudan MULTIPOLYGON (((33.97607 4….
92 NER Niger MULTIPOLYGON (((13.60635 13…
102 NAM Namibia MULTIPOLYGON (((23.38066 -1…

Our First Real Spatial Join: st_join()

Code
joined_sf <- countries_points_sf |> st_join(sampled_points_sf)
joined_sf |> head()
iso_a3 geounit temp geometry
1 ZWE Zimbabwe 10.75655 MULTIPOLYGON (((31.28789 -2…
52 SSD South Sudan 50.89440 MULTIPOLYGON (((33.97607 4….
92 NER Niger 34.71885 MULTIPOLYGON (((13.60635 13…
102 NAM Namibia 84.48096 MULTIPOLYGON (((23.38066 -1…
104 MAR Morocco 99.64294 MULTIPOLYGON (((-2.219629 3…
133 KEN Kenya 39.40338 MULTIPOLYGON (((40.99443 -2…

But… We Were Still in Easy Mode

  • Every point could be matched one-to-one with a country. But what if… 😱
Code
g <- st_make_grid(st_bbox(st_as_sfc("LINESTRING(0 0,1 1)")), n = c(2,2))
par(mar = rep(0,4))
plot(g)
plot(g[1] * diag(c(3/4, 1)) + c(0.25, 0.125), add = TRUE, lty = 2)
text(c(.2, .8, .2, .8), c(.2, .2, .8, .8), c(1,2,4,8), col = 'red')

Spatially Intensive vs. Spatially Extensive

  • Extensive attributes: associated with a physical size (length, area, volume, counts of items). Ex: population count.
    • Associated with an area \(\implies\) if that area is cut into smaller areas, the population count needs to be split too
    • (At minimum, the sum of the population counts for the smaller areas needs to equal the total for the larger area)
  • Intensive attributes: Not proportional to support: if the area is split, values may vary but on average remain the same. Ex: population density
    • If an area is split into smaller areas, population density is not split similarly!
    • The sum of population densities for the smaller areas is a meaningless measure
    • Instead, the mean will be more useful as ~similar to the density of the total

Handling the Extensive Case

  • Assume the extensive attribute \(Y\) is uniformly distributed over a space \(S_i\) (e.g., for population counts we assume everyone is evenly-spaced across the region)

  • We first compute \(Y_{ij}\), derived from \(Y_i\) for a sub-area of \(S_i\), \(A_{ij} = S_i \cap T_j\):

    \[ \hat{Y}_{ij}(A_{ij}) = \frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]

    where \(|\cdot|\) denotes area.

  • Then we can compute \(Y_j(T_j)\) by summing all the elements over area \(T_j\):

\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]

Handling the Intensive Case

  • Assume the variable \(Y\) has constant value over a space \(S_i\) (e.g., population density in assumed to be the same across all sub-areas)
  • Then the estimate for a sub-area is the same as the estimate for the total area:

\[ \hat{Y}_{ij} = Y_i(S_i) \]

  • So that we can obtain estimates of \(Y\) for new spatial units \(T_j\) via area-weighted average of the source values:

\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|T_j|}Y_j(S_i) \]

Let’s Go See It In Action!

Nuts and Bolts for Spatial Data Science

Who Are My Neighbors?

Introducing the spdep library!

Spatial Autocorrelation

References