Week 5: Binary Operations and DE-9IM

PPOL 6805 / DSAN 6750: GIS for Spatial Data Science
Fall 2025

Class Sessions

Author

Affiliation

Jeff Jacobs

jj1088@georgetown.edu

Published

Wednesday, September 25, 2024

Open slides in new tab →

Geospatial Operations 2: Binary Operations

Code

source("../dsan-globals/_globals.r")
library(tidyverse) |> suppressPackageStartupMessages()
library(sf) |> suppressPackageStartupMessages()
library(rnaturalearth) |> suppressPackageStartupMessages()
library(mapview) |> suppressPackageStartupMessages()
library(leaflet.extras2) |> suppressPackageStartupMessages()

Spatial Joins

Code

africa_sf <- ne_countries(continent = "Africa", scale = 50)
africa_union_sf <- sf::st_union(africa_sf)
africa_map <- mapview(africa_sf, label="geounit", legend=FALSE)
nc <- system.file("shape/nc.shp", package="sf") |>
  read_sf() |>
  st_transform('EPSG:2264')
gr <- st_sf(
         label = apply(expand.grid(1:10, LETTERS[10:1])[,2:1], 1, paste0, collapse = ""),
         geom = st_make_grid(nc))
gr$col <- sf.colors(10, categorical = TRUE, alpha = .3)
# cut, to verify that NA's work out:
gr <- gr[-(1:30),]
suppressWarnings(nc_j <- st_join(nc, gr, largest = TRUE))
par(mfrow = c(2,1), mar = rep(0,4))
plot(st_geometry(nc_j), border = 'grey')
plot(st_geometry(gr), add = TRUE, col = gr$col)
text(st_coordinates(st_centroid(st_geometry(gr))), labels = gr$label, cex = .85)
# the joined dataset:
plot(st_geometry(nc_j), border = 'grey', col = nc_j$col)
text(st_coordinates(st_centroid(st_geometry(nc_j))), labels = nc_j$label, cex = .7)
plot(st_geometry(gr), border = '#88ff88aa', add = TRUE)

Spatial Sampling

Code

# Sample random points
africa_points_list <- sf::st_sample(africa_union_sf, 10)
africa_points_sf <- sf::st_sf(africa_points_list)
africa_points_map <- mapview(africa_points_sf, label="Random Point", col.regions=cb_palette[1], legend=FALSE)

Warning in cbind(`Feature ID` = fid, mat): number of rows of result is not a
multiple of vector length (arg 1)

Code

africa_map + africa_points_map

The “Default” Predicate: `st_intersects`

Code

countries_w_points <- africa_sf[africa_points_sf,]
mapview(countries_w_points, label="geounit", legend=FALSE) + africa_points_map

Counting with `lengths()`

Code

country_inter <- sf::st_intersects(africa_sf, africa_points_sf)
# Computes point counts for each polygon
(num_intersections <- lengths(country_inter))

 [1] 0 1 0 0 0 0 0 3 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0
[39] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0

Code

africa_sf <- africa_sf |> mutate(
  num_points = num_intersections
) |> arrange(geounit)
africa_sf |> select(geounit, num_points) |> head()

geounit	num_points	geometry
Algeria	0	MULTIPOLYGON (((8.576563 36…
Angola	1	MULTIPOLYGON (((13.07275 -4…
Benin	0	MULTIPOLYGON (((1.622656 6….
Botswana	0	MULTIPOLYGON (((25.25879 -1…
Burkina Faso	0	MULTIPOLYGON (((0.9004883 1…
Burundi	0	MULTIPOLYGON (((30.55361 -2…

Plotting with `mapview`

Code

mapview(africa_sf, zcol="num_points")

Plotting with `ggplot2`

Since we’re starting to get into data attributes rather than geometric features, switching to ggplot2 is recommended!

Code

africa_sf |> ggplot(aes(fill=num_points)) +
  geom_sf() +
  theme_classic()

One More Unary Operation: `st_buffer()`

Think about how you might answer questions like:
- “How far is your house (POINT) from Manhattan (POLYGON)?”
- “Are there any chemical plants within a mile of this building (POLYGON) / stretch of road (LINESTRING)?”
Lazy mode (my favorite mode): Compute distances from the centroid
GIS master mode: Construct the buffer!

On `POLYGON`s

Key line: manhattan_buffer_sf <- manhattan_union_sf |> st_buffer(dist = 1609.34) (1 mile \(\approx\) 1609.34 meters)

Code

library(tidyverse)
library(sf)
library(tidycensus)
library(tigris)
library(mapview)
options(tigris_use_cache = TRUE)
manhattan_sf <- get_acs(
  geography = "tract",
  variables = "B19013_001",
  state = "NY",
  county = "New York",
  year = 2020,
  geometry = TRUE,
  cb = FALSE
)
# Erase the island tracts real quick
island_tracts <- c(
  "Census Tract 1, New York County, New York",
  "Census Tract 2.02, New York County, New York"
)
manhattan_sf <- manhattan_sf |> filter(
  !(NAME %in% island_tracts)
)
# Union of all census tracts within the county
manhattan_union_sf <- st_union(manhattan_sf)
manhattan_union_map <- mapview(manhattan_union_sf, label="New York County")
# Construct buffer (1 mile ~= 1609.34 meters)
manhattan_buffer_sf <- manhattan_union_sf |> st_buffer(dist = 1609.34)
manhattan_buffer_map <- mapview(manhattan_buffer_sf, label="Buffer (1 Mile)", col.regions = cbPalette[1], legend = TRUE)
manhattan_buffer_map + manhattan_union_map

On `LINESTRING`s

Code

library(tidyverse)
library(sf)
## st_buffer, style options (taken from rgeos gBuffer)
l1 = st_as_sfc("LINESTRING(0 0,1 5,4 5,5 2,8 2,9 4,4 6.5)")
op = par(mfrow=c(2,3))
plot(st_buffer(l1, dist = 1, endCapStyle="ROUND"), reset = FALSE, main = "endCapStyle: ROUND")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, endCapStyle="FLAT"), reset = FALSE, main = "endCapStyle: FLAT")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, endCapStyle="SQUARE"), reset = FALSE, main = "endCapStyle: SQUARE")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, nQuadSegs=1), reset = FALSE, main = "nQuadSegs: 1")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, nQuadSegs=2), reset = FALSE, main = "nQuadSegs: 2")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, nQuadSegs= 5), reset = FALSE, main = "nQuadSegs: 5")
plot(l1,col='blue',add=TRUE)

What Makes Binary Operations “Fancier”?

Unary

st_centroid()
- POLYGON \(\mapsto\) POINT
- MULTIPOLYGON \(\mapsto\) POINT
st_convex_hull()
- POLYGON \(\mapsto\) POLYGON
- MULTIPOINT \(\mapsto\) POLYGON

Binary

st_intersection()
- (POINT, POINT) \(\mapsto\) POINT | POINT EMPTY
- (POLYGON, POLYGON) \(\mapsto\) POLYGON | LINESTRING | POINT | POLYGON EMPTY

st_is_empty() and st_dimension() become your new best friends 😉
st_is_empty(): Distinguishes between, e.g., POINT EMPTY and POINT(0 0)
st_dimension(): NA for empty versions, otherwise
- 2 for surfaces (POLYGON, MULTIPOLYGON)
- 1 for lines (LINESTRING, MULTILINESTRING)
- 0 for points (POINT, MULTIPOINT)

The Bad Kind of Overthinking: Will My Life Just Get Harder and Harder?

Unary Operations

Binary Operations

Ternary Operations

Quaternary Operations

Good News and Bad News

The good news: No!
The bad news: You’ll have to read the 465-page Volume I and then the 451-page Volume II and then to page 15 of Volume III of Cohn (1965) to know why:

(i spent 4 yrs of undergrad studying abstract algebra and now it all sits gathering dust somewhere deep within my brain plz just let me have this moment i’ll never mention it again i promise)

The Good Kind of Overthinking…

For fancier geospatial operations, we’ll need to start overthinking, about the possible relationships between two (or more) geometries! \(\leadsto\) Relational Predicates:

Figure 4.2 in Lovelace, Nowosad, and Muenchow (2024)

Ad Hoc / Ambiguous \(\rightarrow\) Precise: Enter DE-9IM!

DE-9IM Strings

Each cell here visualizes one component of the DE-9IM string 1020F1102, which describes the relationship between the following two geometries:

Boxey McBoxface: POLYGON(0 0, 1 0, 1 1, 0 1, 0 0)
Liney McLineface: LINESTRING(0.5 -0.5, 0.5 0.5)

Code

library(sf)
polygon <- po <- st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0))))
p0 <- st_polygon(list(rbind(c(-1,-1), c(2,-1), c(2,2), c(-1,2), c(-1,-1))))
line <- li <- st_linestring(rbind(c(.5, -.5), c(.5, 0.5)))
s <- st_sfc(po, li)

par(mfrow = c(3,3))
par(mar = c(1,1,1,1))

# "1020F1102"
# 1: 1
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("I(pol)",intersect(),"I(line) = 1")))
lines(rbind(c(.5,0), c(.5,.495)), col = 'red', lwd = 2)
points(0.5, 0.5, pch = 1)

# 2: 0
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("I(pol)",intersect(),"B(line) = 0")))
points(0.5, 0.5, col = 'red', pch = 16)

# 3: 2
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("I(pol)",intersect(),"E(line) = 2")))
plot(po, col = '#ff8888', add = TRUE)
plot(s, col = c(NA, 'darkgreen'), border = 'blue', add = TRUE)

# 4: 0
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("B(pol)",intersect(),"I(line) = 0")))
points(.5, 0, col = 'red', pch = 16)

# 5: F
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("B(pol)",intersect(),"B(line) = F")))

# 6: 1
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("B(pol)",intersect(),"E(line) = 1")))
plot(po, border = 'red', col = NA, add = TRUE, lwd = 2)

# 7: 1
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("E(pol)",intersect(),"I(line) = 1")))
lines(rbind(c(.5, -.5), c(.5, 0)), col = 'red', lwd = 2)

# 8: 0
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("E(pol)",intersect(),"B(line) = 0")))
points(.5, -.5, col = 'red', pch = 16)

# 9: 2
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("E(pol)",intersect(),"E(line) = 2")))
plot(p0 / po, col = '#ff8888', add = TRUE)
plot(s, col = c(NA, 'darkgreen'), border = 'blue', add = TRUE)

The predicate equals corresponds to the DE-9IM string "T*F**FFF*". If any two geometries obey this relationship, they are (topologically) equal!

Slowing Down: 9IM (no DE yet!)

Dimensionally Extended (DE) 9IM

Crunching it Down into a Tiny Box

DE-9IM	Interior	Boundary	Exterior
Interior	2	1	2
Boundary	1	0	1
Exterior	2	1	2

And Then into a Tiny String

212101212

And Then into an Infinitesimally-Small Point

DE-9IM Masks

Now terms can be given unambiguous, precise meaning!

`st_overlaps()`	Interior	Boundary	Exterior
Interior	`T`	`*`	`T`
Boundary	`*`	`*`	`*`
Exterior	`T`	`*`	`*`

Special Values (besides 0, 1, 2):
- T: “True” (non-empty, st_dimension() >= 0)
- F: “False” (empty, st_dimension() == NA)
- *: “Wildcard” (Don’t care what the value is)
st_overlaps(): T*T***T**, st_equals(): T*F**FFF*

DE-9IM vs. Everyday Language

DE-9IM can (in theory) represent \(6^9 = 10~077~696\) possible geometric relationships!
The English language has like 10, and they’re ambiguous ☠️ (Compromise employed by GIS systems: allow multiple masks for same English word:)

English	Mask	`212101212`	Result
“Disjoint”	`FFFF***`	`FALSE`	`x` not disjoint from `y`
“Touches”	`FT*******`	`FALSE`	`x` doesn’t touch `y`
“Touches”	`F*T**`	`FALSE`	`x` doesn’t touch `y`
“Crosses”	`TTT`	`TRUE`	`x` crosses `y`
“Within”	`TFF****`	`FALSE`	`x` is not within `y`
“Overlaps”	`TTT`	`TRUE`	`x` overlaps `y`

`st_relate()`: The Ultimate Predicate

Code

library(tidyverse)
library(rnaturalearth)
us <- ne_states(country="United States of America")
dc <- us |> filter(iso_3166_2 == "US-DC")
us <- us |>
  mutate(
    de9im = st_relate(us, dc),
    touch = st_touches(us, dc, sparse = F)
  ) |>
  select(iso_3166_2, name, de9im, touch) |>
  arrange(name)

although coordinates are longitude/latitude, st_relate assumes that they are
planar

Code

us

iso_3166_2	name	de9im	touch	geometry
US-AL	Alabama	FF2FF1212	FALSE	MULTIPOLYGON (((-87.41958 3…
US-AK	Alaska	FF2FF1212	FALSE	MULTIPOLYGON (((-141.0056 6…
US-AZ	Arizona	FF2FF1212	FALSE	MULTIPOLYGON (((-111.0063 3…
US-AR	Arkansas	FF2FF1212	FALSE	MULTIPOLYGON (((-90.30422 3…
US-CA	California	FF2FF1212	FALSE	MULTIPOLYGON (((-114.7243 3…
US-CO	Colorado	FF2FF1212	FALSE	MULTIPOLYGON (((-109.0463 4…
US-CT	Connecticut	FF2FF1212	FALSE	MULTIPOLYGON (((-73.6417 41…
US-DE	Delaware	FF2FF1212	FALSE	MULTIPOLYGON (((-75.05809 3…
US-DC	District of Columbia	2FFF1FFF2	FALSE	MULTIPOLYGON (((-77.02293 3…
US-FL	Florida	FF2FF1212	FALSE	MULTIPOLYGON (((-87.44734 3…
US-GA	Georgia	FF2FF1212	FALSE	MULTIPOLYGON (((-80.89029 3…
US-HI	Hawaii	FF2FF1212	FALSE	MULTIPOLYGON (((-154.8996 1…
US-ID	Idaho	FF2FF1212	FALSE	MULTIPOLYGON (((-117.0382 4…
US-IL	Illinois	FF2FF1212	FALSE	MULTIPOLYGON (((-89.1237 36…
US-IN	Indiana	FF2FF1212	FALSE	MULTIPOLYGON (((-84.80608 4…
US-IA	Iowa	FF2FF1212	FALSE	MULTIPOLYGON (((-96.48266 4…
US-KS	Kansas	FF2FF1212	FALSE	MULTIPOLYGON (((-102.0396 3…
US-KY	Kentucky	FF2FF1212	FALSE	MULTIPOLYGON (((-89.42446 3…
US-LA	Louisiana	FF2FF1212	FALSE	MULTIPOLYGON (((-89.52599 3…
US-ME	Maine	FF2FF1212	FALSE	MULTIPOLYGON (((-71.08495 4…
US-MD	Maryland	FF2F11212	TRUE	MULTIPOLYGON (((-75.64786 3…
US-MA	Massachusetts	FF2FF1212	FALSE	MULTIPOLYGON (((-71.19396 4…
US-MI	Michigan	FF2FF1212	FALSE	MULTIPOLYGON (((-84.4913 46…
US-MN	Minnesota	FF2FF1212	FALSE	MULTIPOLYGON (((-97.22609 4…
US-MS	Mississippi	FF2FF1212	FALSE	MULTIPOLYGON (((-88.40221 3…
US-MO	Missouri	FF2FF1212	FALSE	MULTIPOLYGON (((-95.31725 4…
US-MT	Montana	FF2FF1212	FALSE	MULTIPOLYGON (((-116.0482 4…
US-NE	Nebraska	FF2FF1212	FALSE	MULTIPOLYGON (((-104.0537 4…
US-NV	Nevada	FF2FF1212	FALSE	MULTIPOLYGON (((-114.0425 4…
US-NH	New Hampshire	FF2FF1212	FALSE	MULTIPOLYGON (((-71.50585 4…
US-NJ	New Jersey	FF2FF1212	FALSE	MULTIPOLYGON (((-75.54133 3…
US-NM	New Mexico	FF2FF1212	FALSE	MULTIPOLYGON (((-108.1375 3…
US-NY	New York	FF2FF1212	FALSE	MULTIPOLYGON (((-79.06523 4…
US-NC	North Carolina	FF2FF1212	FALSE	MULTIPOLYGON (((-76.03173 3…
US-ND	North Dakota	FF2FF1212	FALSE	MULTIPOLYGON (((-104.0476 4…
US-OH	Ohio	FF2FF1212	FALSE	MULTIPOLYGON (((-80.52023 4…
US-OK	Oklahoma	FF2FF1212	FALSE	MULTIPOLYGON (((-103.0002 3…
US-OR	Oregon	FF2FF1212	FALSE	MULTIPOLYGON (((-124.4924 4…
US-PA	Pennsylvania	FF2FF1212	FALSE	MULTIPOLYGON (((-79.76301 4…
US-RI	Rhode Island	FF2FF1212	FALSE	MULTIPOLYGON (((-71.23686 4…
US-SC	South Carolina	FF2FF1212	FALSE	MULTIPOLYGON (((-78.57316 3…
US-SD	South Dakota	FF2FF1212	FALSE	MULTIPOLYGON (((-104.0567 4…
US-TN	Tennessee	FF2FF1212	FALSE	MULTIPOLYGON (((-90.30422 3…
US-TX	Texas	FF2FF1212	FALSE	MULTIPOLYGON (((-103.3115 2…
US-UT	Utah	FF2FF1212	FALSE	MULTIPOLYGON (((-111.0502 4…
US-VT	Vermont	FF2FF1212	FALSE	MULTIPOLYGON (((-73.35134 4…
US-VA	Virginia	FF2F11212	TRUE	MULTIPOLYGON (((-76.01325 3…
US-WA	Washington	FF2FF1212	FALSE	MULTIPOLYGON (((-122.753 48…
US-WV	West Virginia	FF2FF1212	FALSE	MULTIPOLYGON (((-82.58945 3…
US-WI	Wisconsin	FF2FF1212	FALSE	MULTIPOLYGON (((-87.80425 4…
US-WY	Wyoming	FF2FF1212	FALSE	MULTIPOLYGON (((-109.0463 4…

(If You Don’t Want to Scroll)

Code

us |> filter(touch)

Warning: Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
ℹ Please use one dimensional logical vectors instead.

iso_3166_2	name	de9im	touch	geometry
US-MD	Maryland	FF2F11212	TRUE	MULTIPOLYGON (((-75.64786 3…
US-VA	Virginia	FF2F11212	TRUE	MULTIPOLYGON (((-76.01325 3…

References

Cohn, P. M. 1965. Universal Algebra. Springer Science & Business Media.

Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2024. Geocomputation with R. CRC Press. https://r.geocompx.org/.

Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in R. CRC Press. https://r-spatial.org/book/.

Geospatial Operations 2: Binary Operations

Spatial Joins

Spatial Sampling

The “Default” Predicate: st_intersects

Counting with lengths()

Plotting with mapview

Plotting with ggplot2

One More Unary Operation: st_buffer()

On POLYGONs

On LINESTRINGs

What Makes Binary Operations “Fancier”?

The Bad Kind of Overthinking: Will My Life Just Get Harder and Harder?

Good News and Bad News

The Good Kind of Overthinking…

Ad Hoc / Ambiguous \(\rightarrow\) Precise: Enter DE-9IM!

DE-9IM Strings

Slowing Down: 9IM (no DE yet!)

Dimensionally Extended (DE) 9IM

Crunching it Down into a Tiny Box

And Then into a Tiny String

And Then into an Infinitesimally-Small Point

DE-9IM Masks

DE-9IM vs. Everyday Language

st_relate(): The Ultimate Predicate

(If You Don’t Want to Scroll)

References

The “Default” Predicate: `st_intersects`

Counting with `lengths()`

Plotting with `mapview`

Plotting with `ggplot2`

One More Unary Operation: `st_buffer()`

On `POLYGON`s

On `LINESTRING`s

`st_relate()`: The Ultimate Predicate