Week 5: Spatial Joins

PPOL 6805 / DSAN 6750: GIS for Spatial Data Science
Fall 2025

Jeff Jacobs

jj1088@georgetown.edu

Wednesday, September 25, 2024

Doing Things with DE-9IM (Back to Binary Operations)

From Last Week: Almost a Spatial Join

Code
library(rnaturalearth)
library(tidyverse)
library(sf)
library(mapview)
library(leaflet.extras2)
africa_sf <- ne_countries(continent = "Africa", scale = 50) |> select(iso_a3, geounit)
africa_map <- mapview(africa_sf, label="geounit", legend=FALSE)
N <- 10
africa_union_sf <- sf::st_union(africa_sf)
sampled_points_sf <- sf::st_sample(africa_union_sf, N) |> sf::st_sf() |> mutate(temp = runif(N, 0, 100))
sampled_points_map <- mapview(sampled_points_sf, label="Random Point", col.regions=cbPalette[1], legend=FALSE)
countries_points_sf <- africa_sf[sampled_points_sf,]
filtered_map <- mapview(countries_points_sf, label="geounit", legend=FALSE) + sampled_points_map
(africa_map + sampled_points_map) | filtered_map

Spatial Filter \(\neq\) Spatial Join

  • The issue: Data attributes of POINTs are not merged into data attributes of POLYGONs
POINT Attributes
Code
st_geometry(sampled_points_sf) <- c("geom")
sampled_points_sf |> head()
geom temp
POINT (27.08501 8.151116) 6.7348056
POINT (44.94241 10.18856) 0.7728503
POINT (17.22098 -4.448239) 59.0227748
POINT (18.72111 30.36353) 18.1021065
POINT (-9.142789 16.71712) 55.9965106
POINT (25.34225 21.54918) 69.1987649
POLYGON Attributes
Code
countries_points_sf |> head(4)
iso_a3 geounit geometry
44 TZA Tanzania MULTIPOLYGON (((39.49648 -6…
52 SSD South Sudan MULTIPOLYGON (((33.97607 4….
53 SDN Sudan MULTIPOLYGON (((34.07812 9….
59 -99 Somaliland MULTIPOLYGON (((48.93857 11…

Our First Real Spatial Join: st_join()

Code
joined_sf <- countries_points_sf |> st_join(sampled_points_sf)
joined_sf |> head()
iso_a3 geounit temp geometry
44 TZA Tanzania 63.7779831 MULTIPOLYGON (((39.49648 -6…
52 SSD South Sudan 6.7348056 MULTIPOLYGON (((33.97607 4….
53 SDN Sudan 69.1987649 MULTIPOLYGON (((34.07812 9….
53.1 SDN Sudan 62.4328422 MULTIPOLYGON (((34.07812 9….
59 -99 Somaliland 0.7728503 MULTIPOLYGON (((48.93857 11…
112 MRT Mauritania 55.9965106 MULTIPOLYGON (((-16.37334 1…

But… We Were Still in Easy Mode

  • Every point could be matched one-to-one with a country. But what if… 😱
Code
g <- st_make_grid(st_bbox(st_as_sfc("LINESTRING(0 0,1 1)")), n = c(2,2))
par(mar = rep(0,4))
plot(g)
plot(g[1] * diag(c(3/4, 1)) + c(0.25, 0.125), add = TRUE, lty = 2)
text(c(.2, .8, .2, .8), c(.2, .2, .8, .8), c(1,2,4,8), col = 'red')

Spatially Intensive vs. Spatially Extensive

  • Extensive attributes: associated with a physical size (length, area, volume, counts of items). Ex: population count.
    • Associated with an area \(\implies\) if that area is cut into smaller areas, the population count needs to be split too
    • (At minimum, the sum of the population counts for the smaller areas needs to equal the total for the larger area)
  • Intensive attributes: Not proportional to support: if the area is split, values may vary but on average remain the same. Ex: population density
    • If an area is split into smaller areas, population density is not split similarly!
    • The sum of population densities for the smaller areas is a meaningless measure
    • Instead, the mean will be more useful as ~similar to the density of the total

Handling the Extensive Case

  • Assume the extensive attribute \(Y\) is uniformly distributed over a space \(S_i\) (e.g., for population counts we assume everyone is evenly-spaced across the region)

  • We first compute \(Y_{ij}\), derived from \(Y_i\) for a sub-area of \(S_i\), \(A_{ij} = S_i \cap T_j\):

    \[ \hat{Y}_{ij}(A_{ij}) = \frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]

    where \(|\cdot|\) denotes area.

  • Then we can compute \(Y_j(T_j)\) by summing all the elements over area \(T_j\):

\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]

Handling the Intensive Case

  • Assume the variable \(Y\) has constant value over a space \(S_i\) (e.g., population density in assumed to be the same across all sub-areas)
  • Then the estimate for a sub-area is the same as the estimate for the total area:

\[ \hat{Y}_{ij} = Y_i(S_i) \]

  • So that we can obtain estimates of \(Y\) for new spatial units \(T_j\) via area-weighted average of the source values:

\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|T_j|}Y_j(S_i) \]

Let’s Go See It In Action!

Nuts and Bolts for Spatial Data Science

Who Are My Neighbors?

Introducing the spdep library!

Spatial Autocorrelation

References