PPOL 6805 Week 6 Lab: Interpolating Kurdistan

Labs
Author

Jeff Jacobs

Published

October 1, 2025

Part 1: Overview and Data Loading

Historical Kurdistan Geoms from ESRI

First, check out this amazing ESRI StoryMap, which we’ll cite as Hanson and Hinsdale (2023). If you’re interested, you can read a bunch more about the modern history of the Kurds in McDowall (2021).

For the sake of this assignment, though, we’ll focus in on the following 1946 map, which Jeff will give the fuller details of (condensing McDowall (2021) into less than 5 minutes, hopefully!) in class:

From ESRI StoryMaps to sf Objects

Digging into the code using Developer Tools reveals the magical API endpoint: https://services.arcgis.com/3xOwF6p0r7IHIjfn/arcgis/rest/services/Kurdistan1947Boundaries

We could wrestle with ESRI’s API, but, there’s an even more fun way, which I learned about via this blog post. Reading that will lead you on to the pyesridump library, which I use within Python to download the layers of that StoryMap as .geojson files.

The details of how pyesridump converts ESRI StoryMap Features into “standard” WKT Geometries can be fun to learn but are a bit outside of the scope of this writeup. But, from the following output you can see that ESRI Features can contain the WKT geometry types we’ve seen (like POINT, LINESTRING, or POLYGON), and in this case the feature we download does indeed correspond to a collection of POLYGONs (here we print just its first coordinate):

ESRI Type: Feature
ESRI Properties: {'OBJECTID': 1, 'area': 41360557.05027756, 'Shape__Area': 413605942350.6803, 'Shape__Length': 4362383.882545892}
Geometry Type: Polygon
First Coordinate: [37.6952778, 39.2168218]
Source: krd_layers.ipynb

The remainder of the lab uses these files, along with rnaturalearth!

Now Modern Border Geoms

library(tidyverse) |> suppressPackageStartupMessages()
library(sf) |> suppressPackageStartupMessages()
library(rnaturalearth) |> suppressPackageStartupMessages()
library(mapview) |> suppressPackageStartupMessages()
country_list <- c(
  "Turkey",
  "Iran",
  "Iraq",
  "Syria"
)
countries_sf <- rnaturalearth::ne_countries(
  scale=50,
  type="countries",
  country=country_list
)

From among allll the variables in countries_sf, we’re only going to need geounit and (later) pop_est!

countries_sf <- countries_sf |>
  select(geounit, pop_est)
countries_map <- mapview(
  countries_sf,
  zcol='geounit'
)
countries_map

And now the 1946 .geojson file, from the ESRI StoryMap, as discussed above!

print(getwd())
[1] "/Users/jpj/gtown-local/ppol6805/writeups/interpolation"
if (endsWith(getwd(), "ppol6805")) {
  setwd("./writeups/interpolation/")
}
krd_sf <- sf::st_read("data/krd_46.geojson") |>
  mutate(label="Kurdistan 1946")
Reading layer `krd_46' from data source 
  `/Users/jpj/gtown-local/ppol6805/writeups/interpolation/data/krd_46.geojson' 
  using driver `GeoJSON'
Simple feature collection with 1 feature and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 36.25249 ymin: 31.58789 xmax: 50.9462 ymax: 40.05815
Geodetic CRS:  WGS 84
krd_map <- mapview(krd_sf, zcol='label')
krd_map
countries_map + krd_map

Part 2: Areal-Weighted Interpolation for “Division”

First: Splitting KRD

Intersects as the default predicate… not very helpful here!

default_sf <- countries_sf[krd_sf,]
mapview(default_sf)

But, intersection, via st_intersection(), does give us what we want!

# inter_sf <- st_as_sf(st_union(st_intersection(countries_sf, krd_sf)))
inter_sf <- st_intersection(countries_sf, krd_sf)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
mapview(inter_sf)

New Name Column

inter_sf <- inter_sf |> mutate(
  krd_name = paste0(geounit, " Kurdistan")
)

Now let’s view the four rows of the sf object individually:

mapview(inter_sf[1,], zcol='krd_name')
mapview(inter_sf[2,], zcol='krd_name')
mapview(inter_sf[3,], zcol='krd_name')
mapview(inter_sf[4,], zcol='krd_name')

…And All at Once

mapview(inter_sf, zcol='krd_name')

Very cool, but, we haven’t taken population into account at all, which was the whole point of this exercise… spatial join / interpolation time!

Spatial Join / Interpolation

Again drawing on McDowall (2021) (this table is on page 4), we can estimate that the overall population across the four modern countries is approximately 32 million:

Country % of Country Count
Turkey 18 15 million
Iran 10 8 million
Iraq 18 7.2 million
Syria 9 1.8 million
Diaspora and Caucasus - 2 million
Total - 34 million

So, given this estimate, we can now “start over”, this time with the goal of keeping track of the population as we do our operations:

fmt_pop <- function(x) formatC(x,big.mark=",",format='d')
krd_sf <- krd_sf |> mutate(
  pop = 32000000,
  pop_str = fmt_pop(pop)
)
mapview(krd_sf, zcol='pop_str')

Now, if we just use st_intersection()… it doesn’t know that it should pay attention to pop (even if it did know that, it wouldn’t know exactly how you want to split it up… we’ll get there)

pop_intersected_sf <- st_intersection(countries_sf, krd_sf) |>
  mutate(
    pop_str = fmt_pop(pop),
    map_label = paste0(geounit, ' ', pop_str)
  )
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
mapview(pop_intersected_sf, zcol='pop', label='map_label')

So… what do we do? The binary operations we’ve learned so far won’t cut it, sadly.

So, we need to cap off the binary operations portion of the class with the fanciest operation we’ll need: spatial joins, though more specifically we’ll want an interpolation. The function in sf is called st_interpolate_aw().

Let’s add that to our toolkit and get to spatial statistics/data science!

pop_interpolated_sf <- st_interpolate_aw(
  krd_sf['pop'],
  countries_sf,
  extensive=TRUE
)
Warning in st_interpolate_aw.sf(krd_sf["pop"], countries_sf, extensive = TRUE):
st_interpolate_aw assumes attributes are constant or uniform over areas of x
pop_interpolated_sf
pop geometry
13967353 MULTIPOLYGON (((25.97002 40…
749878 MULTIPOLYGON (((35.89268 35…
5987342 MULTIPOLYGON (((42.35898 37…
11295426 MULTIPOLYGON (((56.18799 26…

And let’s visualize it!

pop_interpolated_sf <- pop_interpolated_sf |> mutate(
  pop_rounded = round(pop, 0),
  pop_str = fmt_pop(pop_rounded)
)
mapview(pop_interpolated_sf, zcol='pop', label='pop_str')

Cool, this tells us, if we didn’t create a Kurdistan, and just drew the borders of the four states (hypothetically…….), how many Kurds would be in each country (again, assuming uniform distribution across 1947 Kurdistan!)

pop_interpolated_sf
pop geometry pop_rounded pop_str
13967353 MULTIPOLYGON (((25.97002 40… 13967353 13,967,353
749878 MULTIPOLYGON (((35.89268 35… 749878 749,878
5987342 MULTIPOLYGON (((42.35898 37… 5987342 5,987,342
11295426 MULTIPOLYGON (((56.18799 26… 11295426 11,295,426
pop_interpolated_sf$pop |> sum() |> fmt_pop()
[1] "32,000,000"

Now Interpolating Over the Intersection

I think of the above as kind of a… “partially-spatially-aware” interpolation. This isn’t a great description, but, what I’m getting at is the fact that:

  • It does successfully compute how our population (of 1947 Kurdistan) proportionally distributes across the four national borders, but
  • It then represents the output, geometrically (as in, the geometry column), as just the original four countries, now with an additional non-spatial property: the Kurdish population in that country

So, for what I’ll call a “fully-spatially-aware” interpolation, we’ll combine the two things we figured out above:

  • Spatial intersection between 1947 Kurdistan and the four countries (producing Turkish, Iranian, Syrian, and Iraqi Kurdistan), plus
  • The areal-weighted interpolation we just did above

And it turns out, all that we really need to change is which sf object we’re spatially interpolating across! Meaning, this is the above code, copied-and-pasted, with countries_sf now replaced by inter_sf:

fully_joined_sf <- st_interpolate_aw(
  krd_sf['pop'],
  inter_sf,
  extensive=TRUE
)
Warning in st_interpolate_aw.sf(krd_sf["pop"], inter_sf, extensive = TRUE):
st_interpolate_aw assumes attributes are constant or uniform over areas of x
mapview(fully_joined_sf)

Also take note of something here: how did mapview “know” to color the four regions by population (using the viridis scheme, the default in mapview!), when we didn’t specify anything for the zcol argument?

This happens because, st_interpolate_aw() is solely dedicated to computing a single piece of information: the population that results from distributing the first argument (numeric) across the second argument (an sf, with a geometry column)! It’s technically not intended to carry out a full-on join.

So, if you find yourself needing to keep all the other information along with the new areal-weighted population (which will usually be the case), just make sure you add the new areal-weighted population column in fully_joined_sf back into the sf object you passed as the second argument to st_interpolate_aw(), as a new column:

inter_sf <- inter_sf |> mutate(
  interpolated_pop = fully_joined_sf$pop
)
inter_sf
geounit pop_est OBJECTID area Shape__Area Shape__Length label geometry krd_name interpolated_pop
37 Turkey 83429615 1 41360557 413605942351 4362384 Kurdistan 1946 MULTIPOLYGON (((37.746 39.1… Turkey Kurdistan 13967353
47 Syria 17070135 1 41360557 413605942351 4362384 Kurdistan 1946 MULTIPOLYGON (((41.81781 36… Syria Kurdistan 749878
142 Iraq 39309783 1 41360557 413605942351 4362384 Kurdistan 1946 MULTIPOLYGON (((46.56862 32… Iraq Kurdistan 5987342
143 Iran 82913906 1 41360557 413605942351 4362384 Kurdistan 1946 MULTIPOLYGON (((44.54024 39… Iran Kurdistan 11295426

And now we can use all the other info, like the name, for the rest of our analysis! For example, when making an interactive map!

inter_sf <- inter_sf |> mutate(
  krd_name = paste0(geounit, ' Kurdistan'),
  pop_str = fmt_pop(interpolated_pop),
  krd_label = paste0(krd_name, ': ', pop_str)
)
mapview(inter_sf, zcol='interpolated_pop', label='krd_label')

Part 3: Areal-Weighted Interpolation as “Aggregation”

Part 3.1: Estimating Kurdistan Population from Country Populations

Here, we pretend we don’t know the population of Kurdistan, and try to estimate it from the four country populations, based on how much of each country POLYGONs overlap with the Kurdistan POLYGON:

countries_sf
geounit pop_est geometry
37 Turkey 83429615 MULTIPOLYGON (((25.97002 40…
47 Syria 17070135 MULTIPOLYGON (((35.89268 35…
142 Iraq 39309783 MULTIPOLYGON (((42.35898 37…
143 Iran 82913906 MULTIPOLYGON (((56.18799 26…

Visualized:

countries_sf <- countries_sf |> mutate(
  pop_str = fmt_pop(pop_est),
  pop_label = paste0(geounit, ": ", pop_str)
)
mapview(countries_sf, zcol="pop_est", label="pop_label") + krd_map
krd_pop_est_sf <- st_interpolate_aw(
  countries_sf['pop_est'],
  krd_sf,
  extensive = TRUE
)
Warning in st_interpolate_aw.sf(countries_sf["pop_est"], krd_sf, extensive =
TRUE): st_interpolate_aw assumes attributes are constant or uniform over areas
of x
krd_sf <- krd_sf |> mutate(
  pop_est = krd_pop_est_sf$pop_est,
  pop_str = fmt_pop(pop_est),
  geounit = "Kurdistan",
  pop_label = paste0("Kurdistan Est: ", pop_str)
)
krd_sf
OBJECTID area Shape__Area Shape__Length geometry label pop pop_str pop_est geounit pop_label
1 41360557 413605942351 4362384 POLYGON ((37.69528 39.21682… Kurdistan 1946 3.2e+07 34,607,914 34607915 Kurdistan Kurdistan Est: 34,607,914

And, visualized!

mapview(countries_sf, zcol="pop_est", label="pop_label") + mapview(krd_sf, zcol="pop_est", label="pop_label")

We can also just combine our estimated Kurdistan data with the country data into one, big sf object with 5 rows, where we just have to keep in mind that we’ve technically changed the unit of observation in each row from “UN-recognized country” to essentially “Geopolitical entity of some sort”

krd_sub_sf <- krd_sf |> select(
  geounit, pop_est, pop_str, pop_label
)
pol_ent_sf <- rbind(countries_sf, krd_sub_sf)
pol_ent_sf
geounit pop_est pop_str pop_label geometry
37 Turkey 83429615 83,429,615 Turkey: 83,429,615 MULTIPOLYGON (((25.97002 40…
47 Syria 17070135 17,070,135 Syria: 17,070,135 MULTIPOLYGON (((35.89268 35…
142 Iraq 39309783 39,309,783 Iraq: 39,309,783 MULTIPOLYGON (((42.35898 37…
143 Iran 82913906 82,913,906 Iran: 82,913,906 MULTIPOLYGON (((56.18799 26…
1 Kurdistan 34607915 34,607,914 Kurdistan Est: 34,607,914 POLYGON ((37.69528 39.21682…

And now, with a single mapview call:

mapview(pol_ent_sf, zcol='pop_est', label='pop_label')

Part 4: Calibrating Two Grids

There is a “Northern” Kurdish language, Kurmanji, as well as a “Central” Kurdish language called Sorani1 While it’s a bit difficult (though, very possible!) to get POLYGONs for language groups, there are standardized datasets like Glottolog and WALS that we can use to get at least the centroid of where the language is spoken, as a POINT:

kl_df <- tibble::tribble(
  ~name, ~lat, ~lon,
  # "Northern" Kurdish language: Kurmanji
  "Kurmanji Centroid", 37.00, 43.00,
  # "Central" Kurdish language: Sorani
  "Sorani Centroid", 35.65, 45.81
)
# sf for language centroids
kl_cent_sf <- st_as_sf(
  kl_df,
  coords=c('lon','lat'),
  crs=4326
)
(cent_map <- mapview(kl_cent_sf))

So, let’s use Voronoi tessellation to split Kurdistan into two sections, based on whether each point is closer to the Kurmanji or Sorani centroid!

krd_voronoi <- st_voronoi(
  st_union(kl_cent_sf),
  envelope=st_as_sfc(st_bbox(krd_sf))
)
Warning in st_voronoi.sfc(st_union(kl_cent_sf), envelope =
st_as_sfc(st_bbox(krd_sf))): st_voronoi does not correctly triangulate
longitude/latitude data
krd_extracted <- st_collection_extract(krd_voronoi, "POLYGON")
#mapview(krd_extracted)

krd_inter_sf <- st_sf(geometry = krd_extracted) |>
  st_intersection(krd_sf)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
krd_inter_sf$label <- c("Kurmanji Speakers", "Sorani Speakers")
mapview(krd_inter_sf, zcol='label') + cent_map
#northern_sf <- krd_voronoi[1,]
#southern_sf <- krd_voronoi[2,]
# mapview(northern_sf) + mapview(southern_sf)
#mapview(st_intersection(st_cast(krd_voronoi),krd_sf))

So now our job becomes: can we re-do the above estimate of Kurdistan’s population, but this time derive two separate estimates, where the original estimate is “chopped” into two pieces on the basis of the distribution of area between Kurmanji and Sorani…

The answer is: yes! That’s also the job of st_interpolate_aw() 😎

lang_pop_sf <- st_interpolate_aw(
  countries_sf['pop_est'],
  krd_inter_sf,
  extensive=TRUE
)
Warning in st_interpolate_aw.sf(countries_sf["pop_est"], krd_inter_sf,
extensive = TRUE): st_interpolate_aw assumes attributes are constant or uniform
over areas of x
mapview(lang_pop_sf)

And, adding back into our krd_inter_sf object so we have all the data “carried over”:

krd_inter_sf <- krd_inter_sf |> mutate(
  pop_est = lang_pop_sf$pop_est,
  pop_str = fmt_pop(pop_est),
  pop_label = paste0(label, ": ", pop_str)
)
mapview(krd_inter_sf, zcol='pop_est', label='pop_label')

Wohoo!

References

Hanson, Gianna, and Lanah Hinsdale. 2023. Putting Kurdistan on the Map.” ArcGIS StoryMaps.
McDowall, David. 2021. A Modern History of the Kurds. Bloomsbury Academic.

Footnotes

  1. There is also a “Southern” Kurdish language called Xwarin↩︎