Week 5: Spatial Data Science!

PPOL 6805 / DSAN 6750: GIS for Spatial Data Science
Fall 2024

Class Sessions

Author

Affiliation

Jeff Jacobs

jj1088@georgetown.edu

Published

Wednesday, September 25, 2024

Open slides in new tab →

Doing Things with DE-9IM (Back to Binary Operations)

From Last Week: Almost a Spatial Join

Code

library(rnaturalearth)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

library(sf)

Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

Code

library(mapview)
library(leaflet.extras2)

Loading required package: leaflet

Code

africa_sf <- ne_countries(continent = "Africa", scale = 50) |> select(iso_a3, geounit)
africa_map <- mapview(africa_sf, label="geounit", legend=FALSE)
N <- 10
africa_union_sf <- sf::st_union(africa_sf)
sampled_points_sf <- sf::st_sample(africa_union_sf, N) |> sf::st_sf() |> mutate(temp = runif(N, 0, 100))
sampled_points_map <- mapview(sampled_points_sf, label="Random Point", col.regions=cbPalette[1], legend=FALSE)
countries_points_sf <- africa_sf[sampled_points_sf,]
filtered_map <- mapview(countries_points_sf, label="geounit", legend=FALSE) + sampled_points_map
(africa_map + sampled_points_map) | filtered_map

Spatial Filter \(\neq\) Spatial Join

The issue: Data attributes of POINTs are not merged into data attributes of POLYGONs

POINT Attributes

Code

st_geometry(sampled_points_sf) <- c("geom")
sampled_points_sf |> head()

geom	temp
POINT (-14.78459 21.8602)	9.566285
POINT (19.22184 -7.860099)	76.046094
POINT (17.56247 19.14144)	70.000416
POINT (3.869355 27.05118)	11.345776
POINT (34.59731 7.619234)	7.186874
POINT (23.00621 8.547459)	69.730564

POLYGON Attributes

Code

countries_points_sf |> head(4)

	iso_a3	geounit	geometry
92	NER	Niger	MULTIPOLYGON (((13.60635 13…
103	MOZ	Mozambique	MULTIPOLYGON (((31.28789 -2…
104	MAR	Morocco	MULTIPOLYGON (((-2.219629 3…
112	MRT	Mauritania	MULTIPOLYGON (((-16.37334 1…

Our First Real Spatial Join: `st_join()`

Code

joined_sf <- countries_points_sf |> st_join(sampled_points_sf)
joined_sf |> head()

	iso_a3	geounit	temp	geometry
92	NER	Niger	79.456693	MULTIPOLYGON (((13.60635 13…
103	MOZ	Mozambique	52.658794	MULTIPOLYGON (((31.28789 -2…
104	MAR	Morocco	9.566285	MULTIPOLYGON (((-2.219629 3…
112	MRT	Mauritania	43.462255	MULTIPOLYGON (((-16.37334 1…
172	ETH	Ethiopia	7.186874	MULTIPOLYGON (((35.26836 5….
192	COD	Democratic Republic of the Congo	76.046094	MULTIPOLYGON (((30.75117 -8…

But… We Were Still in Easy Mode

Every point could be matched one-to-one with a country. But what if… 😱

Code

g <- st_make_grid(st_bbox(st_as_sfc("LINESTRING(0 0,1 1)")), n = c(2,2))
par(mar = rep(0,4))
plot(g)
plot(g[1] * diag(c(3/4, 1)) + c(0.25, 0.125), add = TRUE, lty = 2)
text(c(.2, .8, .2, .8), c(.2, .2, .8, .8), c(1,2,4,8), col = 'red')

Spatially Intensive vs. Spatially Extensive

Extensive attributes: associated with a physical size (length, area, volume, counts of items). Ex: population count.
- Associated with an area \(\implies\) if that area is cut into smaller areas, the population count needs to be split too
- (At minimum, the sum of the population counts for the smaller areas needs to equal the total for the larger area)
Intensive attributes: Not proportional to support: if the area is split, values may vary but on average remain the same. Ex: population density
- If an area is split into smaller areas, population density is not split similarly!
- The sum of population densities for the smaller areas is a meaningless measure
- Instead, the mean will be more useful as ~similar to the density of the total

Handling the Extensive Case

Assume the extensive attribute \(Y\) is uniformly distributed over a space \(S_i\) (e.g., for population counts we assume everyone is evenly-spaced across the region)
We first compute \(Y_{ij}\), derived from \(Y_i\) for a sub-area of \(S_i\), \(A_{ij} = S_i \cap T_j\):

\[ \hat{Y}_{ij}(A_{ij}) = \frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]

where \(|\cdot|\) denotes area.
Then we can compute \(Y_j(T_j)\) by summing all the elements over area \(T_j\):

\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]

Handling the Intensive Case

Assume the variable \(Y\) has constant value over a space \(S_i\) (e.g., population density in assumed to be the same across all sub-areas)
Then the estimate for a sub-area is the same as the estimate for the total area:

\[ \hat{Y}_{ij} = Y_i(S_i) \]

So that we can obtain estimates of \(Y\) for new spatial units \(T_j\) via area-weighted average of the source values:

\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|T_j|}Y_j(S_i) \]

Let’s Go See It In Action!

Nuts and Bolts for Spatial Data Science

Who Are My Neighbors?

Introducing the spdep library!

Week 5: Spatial Data Science!

Doing Things with DE-9IM (Back to Binary Operations)

From Last Week: Almost a Spatial Join

Spatial Filter \(\neq\) Spatial Join

Our First Real Spatial Join: `st_join()`

But… We Were Still in Easy Mode

Spatially Intensive vs. Spatially Extensive

Handling the Extensive Case

Handling the Intensive Case

Let’s Go See It In Action!

Nuts and Bolts for Spatial Data Science

Who Are My Neighbors?

Spatial Autocorrelation

References

Doing Things with DE-9IM (Back to Binary Operations)

From Last Week: Almost a Spatial Join

Spatial Filter \(\neq\) Spatial Join

Our First Real Spatial Join: st_join()

But… We Were Still in Easy Mode

Spatially Intensive vs. Spatially Extensive

Handling the Extensive Case

Handling the Intensive Case

Let’s Go See It In Action!

Nuts and Bolts for Spatial Data Science

Who Are My Neighbors?

Spatial Autocorrelation

References

Our First Real Spatial Join: `st_join()`