Week 4: Binary Operations

PPOL 6805 / DSAN 6750: GIS for Spatial Data Science
Fall 2024

Class Sessions
Author
Affiliation

Jeff Jacobs

Published

Wednesday, September 18, 2024

Open slides in new tab →

One More Unary Operation

st_buffer(): An Unary Operation I Forgot to Introduce!

  • Think about how you might answer questions like:
    • “How far is your house (POINT) from Manhattan (POLYGON)?”
    • “Are there any chemical plants within a mile of this building (POLYGON) / stretch of road (LINESTRING)?”
  • Lazy mode (my favorite mode): Compute distances from the centroid
  • GIS master mode: Construct the buffer!

On POLYGONs

Key line: manhattan_buffer_sf <- manhattan_union_sf |> st_buffer(dist = 1609.34) (1 mile \(\approx\) 1609.34 meters)

Code
library(tidyverse)
library(sf)
library(tidycensus)
library(tigris)
library(mapview)
options(tigris_use_cache = TRUE)
manhattan_sf <- get_acs(
  geography = "tract",
  variables = "B19013_001",
  state = "NY",
  county = "New York",
  year = 2020,
  geometry = TRUE,
  cb = FALSE
)
# Erase the island tracts real quick
island_tracts <- c(
  "Census Tract 1, New York County, New York",
  "Census Tract 2.02, New York County, New York"
)
manhattan_sf <- manhattan_sf |> filter(
  !(NAME %in% island_tracts)
)
# Union of all census tracts within the county
manhattan_union_sf <- st_union(manhattan_sf)
manhattan_union_map <- mapview(manhattan_union_sf, label="New York County")
# Construct buffer (1 mile ~= 1609.34 meters)
manhattan_buffer_sf <- manhattan_union_sf |> st_buffer(dist = 1609.34)
manhattan_buffer_map <- mapview(manhattan_buffer_sf, label="Buffer (1 Mile)", col.regions = cbPalette[1], legend = TRUE)
manhattan_buffer_map + manhattan_union_map

On LINESTRINGs

Code
library(tidyverse)
library(sf)
## st_buffer, style options (taken from rgeos gBuffer)
l1 = st_as_sfc("LINESTRING(0 0,1 5,4 5,5 2,8 2,9 4,4 6.5)")
op = par(mfrow=c(2,3))
plot(st_buffer(l1, dist = 1, endCapStyle="ROUND"), reset = FALSE, main = "endCapStyle: ROUND")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, endCapStyle="FLAT"), reset = FALSE, main = "endCapStyle: FLAT")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, endCapStyle="SQUARE"), reset = FALSE, main = "endCapStyle: SQUARE")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, nQuadSegs=1), reset = FALSE, main = "nQuadSegs: 1")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, nQuadSegs=2), reset = FALSE, main = "nQuadSegs: 2")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, nQuadSegs= 5), reset = FALSE, main = "nQuadSegs: 5")
plot(l1,col='blue',add=TRUE)

From the sf Documentation

From Unary to Binary Operations

What Makes Binary Operations “Fancier”?

Unary
  • st_centroid()
    • POLYGON \(\mapsto\) POINT
    • MULTIPOLYGON \(\mapsto\) POINT
  • st_convex_hull()
    • POLYGON \(\mapsto\) POLYGON
    • MULTIPOINT \(\mapsto\) POLYGON
Binary
  • st_intersection()
    • (POINT, POINT) \(\mapsto\) POINT | POINT EMPTY
    • (POLYGON, POLYGON) \(\mapsto\) POLYGON | LINESTRING | POINT | POLYGON EMPTY
  • st_is_empty() and st_dimension() become your new best friends 😉
  • st_is_empty(): Distinguishes between, e.g., POINT EMPTY and POINT(0 0)
  • st_dimension(): NA for empty versions, otherwise
    • 2 for surfaces (POLYGON, MULTIPOLYGON)
    • 1 for lines (LINESTRING, MULTILINESTRING)
    • 0 for points (POINT, MULTIPOINT)

The Bad Kind of Overthinking: Will My Life Just Get Harder and Harder?

Unary Operations

Binary Operations

Ternary Operations

Quaternary Operations

Good News and Bad News

  • The good news: No!
  • The bad news: You’ll have to read the 465-page Volume I and then the 451-page Volume II and then to page 15 of Volume III of Cohn (1965) to know why:

(i spent 4 yrs of undergrad studying abstract algebra and now it all sits gathering dust somewhere deep within my brain plz just let me have this moment i’ll never mention it again i promise)

The Good Kind of Overthinking…

  • For fancier geospatial operations, we’ll need to start overthinking, about the possible relationships between two (or more) geometries! \(\leadsto\) Relational Predicates:

Figure 4.2 in Lovelace, Nowosad, and Muenchow (2024)

Ad Hoc / Ambiguous \(\rightarrow\) Precise: Enter DE-9IM!

DE-9IM Strings

Each cell here visualizes one component of the DE-9IM string 1020F1102, which describes the relationship between the following two geometries:

  • Boxey McBoxface: POLYGON(0 0, 1 0, 1 1, 0 1, 0 0)
  • Liney McLineface: LINESTRING(0.5 -0.5, 0.5 0.5)
Code
library(sf)
polygon <- po <- st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0))))
p0 <- st_polygon(list(rbind(c(-1,-1), c(2,-1), c(2,2), c(-1,2), c(-1,-1))))
line <- li <- st_linestring(rbind(c(.5, -.5), c(.5, 0.5)))
s <- st_sfc(po, li)

par(mfrow = c(3,3))
par(mar = c(1,1,1,1))

# "1020F1102"
# 1: 1
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("I(pol)",intersect(),"I(line) = 1")))
lines(rbind(c(.5,0), c(.5,.495)), col = 'red', lwd = 2)
points(0.5, 0.5, pch = 1)

# 2: 0
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("I(pol)",intersect(),"B(line) = 0")))
points(0.5, 0.5, col = 'red', pch = 16)

# 3: 2
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("I(pol)",intersect(),"E(line) = 2")))
plot(po, col = '#ff8888', add = TRUE)
plot(s, col = c(NA, 'darkgreen'), border = 'blue', add = TRUE)

# 4: 0
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("B(pol)",intersect(),"I(line) = 0")))
points(.5, 0, col = 'red', pch = 16)

# 5: F
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("B(pol)",intersect(),"B(line) = F")))

# 6: 1
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("B(pol)",intersect(),"E(line) = 1")))
plot(po, border = 'red', col = NA, add = TRUE, lwd = 2)

# 7: 1
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("E(pol)",intersect(),"I(line) = 1")))
lines(rbind(c(.5, -.5), c(.5, 0)), col = 'red', lwd = 2)

# 8: 0
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("E(pol)",intersect(),"B(line) = 0")))
points(.5, -.5, col = 'red', pch = 16)

# 9: 2
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("E(pol)",intersect(),"E(line) = 2")))
plot(p0 / po, col = '#ff8888', add = TRUE)
plot(s, col = c(NA, 'darkgreen'), border = 'blue', add = TRUE)

Code from Pebesma and Bivand (2023)

Slowing Down: 9IM (no DE yet!)

Table 1: From OSGeo Project
9IM Interior Boundary Exterior
Interior
 

 

 
Boundary
 

 

 
Exterior
 

 

 

Dimensionally Extended (DE) 9IM

Table 2: From OSGeo Project
9IM Interior Boundary Exterior
Interior
2

1

2
Boundary
1

0

1
Exterior
2

1

2

Crunching it Down into a Tiny Box

DE-9IM Interior Boundary Exterior
Interior 2 1 2
Boundary 1 0 1
Exterior 2 1 2

And Then into a Tiny String

212101212

And Then into an Infinitesimally-Small Point

DE-9IM Masks

  • Now terms can be given unambiguous, precise meaning!
st_overlaps() Interior Boundary Exterior
Interior T * T
Boundary * * *
Exterior T * *
  • Special Values (besides 0, 1, 2):
    • T: “True” (non-empty, st_dimension() >= 0)
    • F: “False” (empty, st_dimension() == NA)
    • *: “Wildcard” (Don’t care what the value is)
  • st_overlaps(): T*T***T**, st_equals(): T*F**FFF*

DE-9IM vs. Everyday Language

  • DE-9IM can (in theory) represent \(6^9 = 10~077~696\) possible geometric relationships!
  • The English language has like 10, and they’re ambiguous ☠️ (Compromise employed by GIS systems: allow multiple masks for same English word:)
English Mask 212101212 Result
“Disjoint” FF*FF**** FALSE x not disjoint from y
“Touches” FT******* FALSE x doesn’t touch y
“Touches” F***T**** FALSE x doesn’t touch y
“Crosses” T*T***T** TRUE x crosses y
“Within” TF*F***** FALSE x is not within y
“Overlaps” T*T***T** TRUE x overlaps y

st_relate(): The Ultimate Predicate

Code
library(tidyverse)
library(rnaturalearth)
us <- ne_states(country="United States of America")
dc <- us |> filter(iso_3166_2 == "US-DC")
us <- us |>
  mutate(
    de9im = st_relate(us, dc),
    touch = st_touches(us, dc, sparse = F)
  ) |>
  select(iso_3166_2, name, de9im, touch) |>
  arrange(name)
although coordinates are longitude/latitude, st_relate assumes that they are
planar
Code
us
iso_3166_2 name de9im touch geometry
US-AL Alabama FF2FF1212 FALSE MULTIPOLYGON (((-87.41958 3…
US-AK Alaska FF2FF1212 FALSE MULTIPOLYGON (((-141.0056 6…
US-AZ Arizona FF2FF1212 FALSE MULTIPOLYGON (((-111.0063 3…
US-AR Arkansas FF2FF1212 FALSE MULTIPOLYGON (((-90.30422 3…
US-CA California FF2FF1212 FALSE MULTIPOLYGON (((-114.7243 3…
US-CO Colorado FF2FF1212 FALSE MULTIPOLYGON (((-109.0463 4…
US-CT Connecticut FF2FF1212 FALSE MULTIPOLYGON (((-73.6417 41…
US-DE Delaware FF2FF1212 FALSE MULTIPOLYGON (((-75.05809 3…
US-DC District of Columbia 2FFF1FFF2 FALSE MULTIPOLYGON (((-77.02293 3…
US-FL Florida FF2FF1212 FALSE MULTIPOLYGON (((-87.44734 3…
US-GA Georgia FF2FF1212 FALSE MULTIPOLYGON (((-80.89029 3…
US-HI Hawaii FF2FF1212 FALSE MULTIPOLYGON (((-154.8996 1…
US-ID Idaho FF2FF1212 FALSE MULTIPOLYGON (((-117.0382 4…
US-IL Illinois FF2FF1212 FALSE MULTIPOLYGON (((-89.1237 36…
US-IN Indiana FF2FF1212 FALSE MULTIPOLYGON (((-84.80608 4…
US-IA Iowa FF2FF1212 FALSE MULTIPOLYGON (((-96.48266 4…
US-KS Kansas FF2FF1212 FALSE MULTIPOLYGON (((-102.0396 3…
US-KY Kentucky FF2FF1212 FALSE MULTIPOLYGON (((-89.42446 3…
US-LA Louisiana FF2FF1212 FALSE MULTIPOLYGON (((-89.52599 3…
US-ME Maine FF2FF1212 FALSE MULTIPOLYGON (((-71.08495 4…
US-MD Maryland FF2F11212 TRUE MULTIPOLYGON (((-75.64786 3…
US-MA Massachusetts FF2FF1212 FALSE MULTIPOLYGON (((-71.19396 4…
US-MI Michigan FF2FF1212 FALSE MULTIPOLYGON (((-84.4913 46…
US-MN Minnesota FF2FF1212 FALSE MULTIPOLYGON (((-97.22609 4…
US-MS Mississippi FF2FF1212 FALSE MULTIPOLYGON (((-88.40221 3…
US-MO Missouri FF2FF1212 FALSE MULTIPOLYGON (((-95.31725 4…
US-MT Montana FF2FF1212 FALSE MULTIPOLYGON (((-116.0482 4…
US-NE Nebraska FF2FF1212 FALSE MULTIPOLYGON (((-104.0537 4…
US-NV Nevada FF2FF1212 FALSE MULTIPOLYGON (((-114.0425 4…
US-NH New Hampshire FF2FF1212 FALSE MULTIPOLYGON (((-71.50585 4…
US-NJ New Jersey FF2FF1212 FALSE MULTIPOLYGON (((-75.54133 3…
US-NM New Mexico FF2FF1212 FALSE MULTIPOLYGON (((-108.1375 3…
US-NY New York FF2FF1212 FALSE MULTIPOLYGON (((-79.06523 4…
US-NC North Carolina FF2FF1212 FALSE MULTIPOLYGON (((-76.03173 3…
US-ND North Dakota FF2FF1212 FALSE MULTIPOLYGON (((-104.0476 4…
US-OH Ohio FF2FF1212 FALSE MULTIPOLYGON (((-80.52023 4…
US-OK Oklahoma FF2FF1212 FALSE MULTIPOLYGON (((-103.0002 3…
US-OR Oregon FF2FF1212 FALSE MULTIPOLYGON (((-124.4924 4…
US-PA Pennsylvania FF2FF1212 FALSE MULTIPOLYGON (((-79.76301 4…
US-RI Rhode Island FF2FF1212 FALSE MULTIPOLYGON (((-71.23686 4…
US-SC South Carolina FF2FF1212 FALSE MULTIPOLYGON (((-78.57316 3…
US-SD South Dakota FF2FF1212 FALSE MULTIPOLYGON (((-104.0567 4…
US-TN Tennessee FF2FF1212 FALSE MULTIPOLYGON (((-90.30422 3…
US-TX Texas FF2FF1212 FALSE MULTIPOLYGON (((-103.3115 2…
US-UT Utah FF2FF1212 FALSE MULTIPOLYGON (((-111.0502 4…
US-VT Vermont FF2FF1212 FALSE MULTIPOLYGON (((-73.35134 4…
US-VA Virginia FF2F11212 TRUE MULTIPOLYGON (((-76.01325 3…
US-WA Washington FF2FF1212 FALSE MULTIPOLYGON (((-122.753 48…
US-WV West Virginia FF2FF1212 FALSE MULTIPOLYGON (((-82.58945 3…
US-WI Wisconsin FF2FF1212 FALSE MULTIPOLYGON (((-87.80425 4…
US-WY Wyoming FF2FF1212 FALSE MULTIPOLYGON (((-109.0463 4…

(If You Don’t Want to Scroll)

Code
us |> filter(touch)
Warning: Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
ℹ Please use one dimensional logical vectors instead.
iso_3166_2 name de9im touch geometry
US-MD Maryland FF2F11212 TRUE MULTIPOLYGON (((-75.64786 3…
US-VA Virginia FF2F11212 TRUE MULTIPOLYGON (((-76.01325 3…

Doing Things with DE-9IM (Back to Binary Operations)

From Last Week: Almost a Spatial Join

Code
library(mapview)
library(leaflet.extras2)
africa_sf <- ne_countries(continent = "Africa", scale = 50) |> select(iso_a3, geounit)
africa_map <- mapview(africa_sf, label="geounit", legend=FALSE)
N <- 10
africa_union_sf <- sf::st_union(africa_sf)
sampled_points_sf <- sf::st_sample(africa_union_sf, N) |> sf::st_sf() |> mutate(temp = runif(N, 0, 100))
sampled_points_map <- mapview(sampled_points_sf, label="Random Point", col.regions=cbPalette[1], legend=FALSE)
countries_points_sf <- africa_sf[sampled_points_sf,]
filtered_map <- mapview(countries_points_sf, label="geounit", legend=FALSE) + sampled_points_map
(africa_map + sampled_points_map) | filtered_map

Spatial Filter \(\neq\) Spatial Join

  • The issue: Data attributes of POINTs are not merged into data attributes of POLYGONs
POINT Attributes
Code
st_geometry(sampled_points_sf) <- c("geom")
sampled_points_sf |> head()
geom temp
POINT (23.5386 22.01628) 94.385575
POINT (14.64665 -13.35285) 26.332960
POINT (32.29586 -27.78115) 95.603092
POINT (16.11575 -9.174237) 11.086251
POINT (14.39308 -1.016639) 6.604077
POINT (47.91478 7.723278) 40.497326
POLYGON Attributes
Code
countries_points_sf |> head(4)
iso_a3 geounit geometry
57 ZAF South Africa MULTIPOLYGON (((29.36484 -2…
58 SOM Somalia MULTIPOLYGON (((41.53271 -1…
104 MAR Morocco MULTIPOLYGON (((-2.219629 3…
118 MDG Madagascar MULTIPOLYGON (((49.53828 -1…

Our First Real Spatial Join: st_join()

Code
joined_sf <- countries_points_sf |> st_join(sampled_points_sf)
joined_sf |> head()
iso_a3 geounit temp geometry
57 ZAF South Africa 95.603092 MULTIPOLYGON (((29.36484 -2…
58 SOM Somalia 40.497326 MULTIPOLYGON (((41.53271 -1…
104 MAR Morocco 94.671352 MULTIPOLYGON (((-2.219629 3…
118 MDG Madagascar 1.921007 MULTIPOLYGON (((49.53828 -1…
123 LBY Libya 94.385575 MULTIPOLYGON (((9.51875 30….
160 GAB Gabon 6.604077 MULTIPOLYGON (((13.29355 2….

But… We Were Still in Easy Mode

  • Every point could be matched one-to-one with a country. But what if… 😱
Code
g <- st_make_grid(st_bbox(st_as_sfc("LINESTRING(0 0,1 1)")), n = c(2,2))
par(mar = rep(0,4))
plot(g)
plot(g[1] * diag(c(3/4, 1)) + c(0.25, 0.125), add = TRUE, lty = 2)
text(c(.2, .8, .2, .8), c(.2, .2, .8, .8), c(1,2,4,8), col = 'red')

Spatially Intensive vs. Spatially Extensive

  • Extensive attributes: associated with a physical size (length, area, volume, counts of items). Ex: population count.
    • Associated with an area \(\implies\) if that area is cut into smaller areas, the population count needs to be split too
    • (At minimum, the sum of the population counts for the smaller areas needs to equal the total for the larger area)
  • Intensive attributes: Not proportional to support: if the area is split, values may vary but on average remain the same. Ex: population density
    • If an area is split into smaller areas, population density is not split similarly!
    • The sum of population densities for the smaller areas is a meaningless measure
    • Instead, the mean will be more useful as ~similar to the density of the total

Handling the Extensive Case

  • Assume the extensive attribute \(Y\) is uniformly distributed over a space \(S_i\) (e.g., for population counts we assume everyone is evenly-spaced across the region)

  • We first compute \(Y_{ij}\), derived from \(Y_i\) for a sub-area of \(S_i\), \(A_{ij} = S_i \cap T_j\):

    \[ \hat{Y}_{ij}(A_{ij}) = \frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]

    where \(|\cdot|\) denotes area.

  • Then we can compute \(Y_j(T_j)\) by summing all the elements over area \(T_j\):

\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]

Handling the Intensive Case

  • Assume the variable \(Y\) has constant value over a space \(S_i\) (e.g., population density in assumed to be the same across all sub-areas)
  • Then the estimate for a sub-area is the same as the estimate for the total area:

\[ \hat{Y}_{ij} = Y_i(S_i) \]

  • So that we can obtain estimates of \(Y\) for new spatial units \(T_j\) via area-weighted average of the source values:

\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|T_j|}Y_j(S_i) \]

References

Cohn, P. M. 1965. Universal Algebra. Springer Science & Business Media.
Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2024. Geocomputation with R. CRC Press. https://r.geocompx.org/.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in R. CRC Press. https://r-spatial.org/book/.