Week 5: Spatial Data Science!

PPOL 6805 / DSAN 6750: GIS for Spatial Data Science
Fall 2024

Jeff Jacobs

jj1088@georgetown.edu

Wednesday, September 25, 2024

Doing Things with DE-9IM (Back to Binary Operations)

From Last Week: Almost a Spatial Join

Code
library(rnaturalearth)
library(tidyverse)
library(sf)
library(mapview)
library(leaflet.extras2)
africa_sf <- ne_countries(continent = "Africa", scale = 50) |> select(iso_a3, geounit)
africa_map <- mapview(africa_sf, label="geounit", legend=FALSE)
N <- 10
africa_union_sf <- sf::st_union(africa_sf)
sampled_points_sf <- sf::st_sample(africa_union_sf, N) |> sf::st_sf() |> mutate(temp = runif(N, 0, 100))
sampled_points_map <- mapview(sampled_points_sf, label="Random Point", col.regions=cbPalette[1], legend=FALSE)
countries_points_sf <- africa_sf[sampled_points_sf,]
filtered_map <- mapview(countries_points_sf, label="geounit", legend=FALSE) + sampled_points_map
(africa_map + sampled_points_map) | filtered_map

Spatial Filter \(\neq\) Spatial Join

  • The issue: Data attributes of POINTs are not merged into data attributes of POLYGONs
POINT Attributes
Code
st_geometry(sampled_points_sf) <- c("geom")
sampled_points_sf |> head()
geom temp
POINT (36.75732 2.445738) 26.046663
POINT (21.35785 -1.863695) 71.238264
POINT (27.16849 25.25473) 92.820181
POINT (-3.069629 8.798227) 2.254003
POINT (3.248394 24.06513) 97.812956
POINT (8.131082 26.01343) 3.260112
POLYGON Attributes
Code
countries_points_sf |> head(4)
iso_a3 geounit geometry
38 TUN Tunisia MULTIPOLYGON (((11.50459 33…
91 NGA Nigeria MULTIPOLYGON (((7.300781 4….
102 NAM Namibia MULTIPOLYGON (((23.38066 -1…
118 MDG Madagascar MULTIPOLYGON (((49.53828 -1…

Our First Real Spatial Join: st_join()

Code
joined_sf <- countries_points_sf |> st_join(sampled_points_sf)
joined_sf |> head()
iso_a3 geounit temp geometry
38 TUN Tunisia 36.33485 MULTIPOLYGON (((11.50459 33…
91 NGA Nigeria 42.19338 MULTIPOLYGON (((7.300781 4….
102 NAM Namibia 14.98781 MULTIPOLYGON (((23.38066 -1…
118 MDG Madagascar 13.59941 MULTIPOLYGON (((49.53828 -1…
133 KEN Kenya 26.04666 MULTIPOLYGON (((40.99443 -2…
177 EGY Egypt 92.82018 MULTIPOLYGON (((36.87139 21…

But… We Were Still in Easy Mode

  • Every point could be matched one-to-one with a country. But what if… 😱
Code
g <- st_make_grid(st_bbox(st_as_sfc("LINESTRING(0 0,1 1)")), n = c(2,2))
par(mar = rep(0,4))
plot(g)
plot(g[1] * diag(c(3/4, 1)) + c(0.25, 0.125), add = TRUE, lty = 2)
text(c(.2, .8, .2, .8), c(.2, .2, .8, .8), c(1,2,4,8), col = 'red')

Spatially Intensive vs. Spatially Extensive

  • Extensive attributes: associated with a physical size (length, area, volume, counts of items). Ex: population count.
    • Associated with an area \(\implies\) if that area is cut into smaller areas, the population count needs to be split too
    • (At minimum, the sum of the population counts for the smaller areas needs to equal the total for the larger area)
  • Intensive attributes: Not proportional to support: if the area is split, values may vary but on average remain the same. Ex: population density
    • If an area is split into smaller areas, population density is not split similarly!
    • The sum of population densities for the smaller areas is a meaningless measure
    • Instead, the mean will be more useful as ~similar to the density of the total

Handling the Extensive Case

  • Assume the extensive attribute \(Y\) is uniformly distributed over a space \(S_i\) (e.g., for population counts we assume everyone is evenly-spaced across the region)

  • We first compute \(Y_{ij}\), derived from \(Y_i\) for a sub-area of \(S_i\), \(A_{ij} = S_i \cap T_j\):

    \[ \hat{Y}_{ij}(A_{ij}) = \frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]

    where \(|\cdot|\) denotes area.

  • Then we can compute \(Y_j(T_j)\) by summing all the elements over area \(T_j\):

\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]

Handling the Intensive Case

  • Assume the variable \(Y\) has constant value over a space \(S_i\) (e.g., population density in assumed to be the same across all sub-areas)
  • Then the estimate for a sub-area is the same as the estimate for the total area:

\[ \hat{Y}_{ij} = Y_i(S_i) \]

  • So that we can obtain estimates of \(Y\) for new spatial units \(T_j\) via area-weighted average of the source values:

\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|T_j|}Y_j(S_i) \]

Let’s Go See It In Action!

Nuts and Bolts for Spatial Data Science

Who Are My Neighbors?

Introducing the spdep library!

Spatial Autocorrelation

References