PPOL 6805 Week 6 Lab: Interpolating Kurdistan

Labs

Author

Jeff Jacobs

Published

October 1, 2025

Part 1: Overview and Data Loading

Historical Kurdistan Geoms from ESRI

First, check out this amazing ESRI StoryMap, which we’ll cite as Hanson and Hinsdale (2023). If you’re interested, you can read a bunch more about the modern history of the Kurds in McDowall (2021).

For the sake of this assignment, though, we’ll focus in on the following 1946 map, which Jeff will give the fuller details of (condensing McDowall (2021) into less than 5 minutes, hopefully!) in class:

From ESRI StoryMaps to `sf` Objects

Digging into the code using Developer Tools reveals the magical API endpoint: https://services.arcgis.com/3xOwF6p0r7IHIjfn/arcgis/rest/services/Kurdistan1947Boundaries

We could wrestle with ESRI’s API, but, there’s an even more fun way, which I learned about via this blog post. Reading that will lead you on to the pyesridump library, which I use within Python to download the layers of that StoryMap as .geojson files.

The details of how pyesridump converts ESRI StoryMap Features into “standard” WKT Geometries can be fun to learn but are a bit outside of the scope of this writeup. But, from the following output you can see that ESRI Features can contain the WKT geometry types we’ve seen (like POINT, LINESTRING, or POLYGON), and in this case the feature we download does indeed correspond to a collection of POLYGONs (here we print just its first coordinate):

ESRI Type: Feature
ESRI Properties: {'OBJECTID': 1, 'area': 41360557.05027756, 'Shape__Area': 413605942350.6803, 'Shape__Length': 4362383.882545892}
Geometry Type: Polygon
First Coordinate: [37.6952778, 39.2168218]

Source: Scraping Data from ArcGIS StoryMaps

The remainder of the lab uses these files, along with rnaturalearth!

Now Modern Border Geoms

library(tidyverse) |> suppressPackageStartupMessages()
library(sf) |> suppressPackageStartupMessages()
library(rnaturalearth) |> suppressPackageStartupMessages()
library(mapview) |> suppressPackageStartupMessages()

country_list <- c(
  "Turkey",
  "Iran",
  "Iraq",
  "Syria"
)
countries_sf <- rnaturalearth::ne_countries(
  scale=50,
  type="countries",
  country=country_list
)

From among allll the variables in countries_sf, we’re only going to need geounit and (later) pop_est!

countries_sf <- countries_sf |>
  select(geounit, pop_est)

countries_map <- mapview(
  countries_sf,
  zcol='geounit'
)
countries_map

And now the 1946 .geojson file, from the ESRI StoryMap, as discussed above!

print(getwd())

[1] "/Users/jpj/gtown-local/ppol6805/writeups/interpolation"

if (endsWith(getwd(), "ppol6805")) {
  setwd("./writeups/interpolation/")
}
krd_sf <- sf::st_read("data/krd_46.geojson") |>
  mutate(label="Kurdistan 1946")

Reading layer `krd_46' from data source 
  `/Users/jpj/gtown-local/ppol6805/writeups/interpolation/data/krd_46.geojson' 
  using driver `GeoJSON'
Simple feature collection with 1 feature and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 36.25249 ymin: 31.58789 xmax: 50.9462 ymax: 40.05815
Geodetic CRS:  WGS 84

krd_map <- mapview(krd_sf, zcol='label')
krd_map

countries_map + krd_map

Part 2: Areal-Weighted Interpolation for “Division”

First: Splitting KRD

Intersects as the default predicate… not very helpful here!

default_sf <- countries_sf[krd_sf,]
mapview(default_sf)

But, intersection, via st_intersection(), does give us what we want!

# inter_sf <- st_as_sf(st_union(st_intersection(countries_sf, krd_sf)))
inter_sf <- st_intersection(countries_sf, krd_sf)

Warning: attribute variables are assumed to be spatially constant throughout
all geometries

mapview(inter_sf)

New Name Column

inter_sf <- inter_sf |> mutate(
  krd_name = paste0(geounit, " Kurdistan")
)

Now let’s view the four rows of the sf object individually:

mapview(inter_sf[1,], zcol='krd_name')

mapview(inter_sf[2,], zcol='krd_name')

mapview(inter_sf[3,], zcol='krd_name')

mapview(inter_sf[4,], zcol='krd_name')

…And All at Once

mapview(inter_sf, zcol='krd_name')

Very cool, but, we haven’t taken population into account at all, which was the whole point of this exercise… spatial join / interpolation time!

Spatial Join / Interpolation

Again drawing on McDowall (2021) (this table is on page 4), we can estimate that the overall population across the four modern countries is approximately 32 million:

Country	% of Country	Count
Turkey	18	15 million
Iran	10	8 million
Iraq	18	7.2 million
Syria	9	1.8 million
Diaspora and Caucasus	-	2 million
Total	-	34 million

So, given this estimate, we can now “start over”, this time with the goal of keeping track of the population as we do our operations:

fmt_pop <- function(x) formatC(x,big.mark=",",format='d')
krd_sf <- krd_sf |> mutate(
  pop = 32000000,
  pop_str = fmt_pop(pop)
)
mapview(krd_sf, zcol='pop_str')

Now, if we just use st_intersection()… it doesn’t know that it should pay attention to pop (even if it did know that, it wouldn’t know exactly how you want to split it up… we’ll get there)

pop_intersected_sf <- st_intersection(countries_sf, krd_sf) |>
  mutate(
    pop_str = fmt_pop(pop),
    map_label = paste0(geounit, ' ', pop_str)
  )

Warning: attribute variables are assumed to be spatially constant throughout
all geometries

mapview(pop_intersected_sf, zcol='pop', label='map_label')

So… what do we do? The binary operations we’ve learned so far won’t cut it, sadly.

So, we need to cap off the binary operations portion of the class with the fanciest operation we’ll need: spatial joins, though more specifically we’ll want an interpolation. The function in sf is called st_interpolate_aw().

Let’s add that to our toolkit and get to spatial statistics/data science!

pop_interpolated_sf <- st_interpolate_aw(
  krd_sf['pop'],
  countries_sf,
  extensive=TRUE
)

Warning in st_interpolate_aw.sf(krd_sf["pop"], countries_sf, extensive = TRUE):
st_interpolate_aw assumes attributes are constant or uniform over areas of x

pop_interpolated_sf

pop	geometry
13967353	MULTIPOLYGON (((25.97002 40…
749878	MULTIPOLYGON (((35.89268 35…
5987342	MULTIPOLYGON (((42.35898 37…
11295426	MULTIPOLYGON (((56.18799 26…

And let’s visualize it!

pop_interpolated_sf <- pop_interpolated_sf |> mutate(
  pop_rounded = round(pop, 0),
  pop_str = fmt_pop(pop_rounded)
)
mapview(pop_interpolated_sf, zcol='pop', label='pop_str')

Cool, this tells us, if we didn’t create a Kurdistan, and just drew the borders of the four states (hypothetically…….), how many Kurds would be in each country (again, assuming uniform distribution across 1947 Kurdistan!)

pop_interpolated_sf

pop	geometry	pop_rounded	pop_str
13967353	MULTIPOLYGON (((25.97002 40…	13967353	13,967,353
749878	MULTIPOLYGON (((35.89268 35…	749878	749,878
5987342	MULTIPOLYGON (((42.35898 37…	5987342	5,987,342
11295426	MULTIPOLYGON (((56.18799 26…	11295426	11,295,426

pop_interpolated_sf$pop |> sum() |> fmt_pop()

[1] "32,000,000"

Now Interpolating Over the Intersection

I think of the above as kind of a… “partially-spatially-aware” interpolation. This isn’t a great description, but, what I’m getting at is the fact that:

It does successfully compute how our population (of 1947 Kurdistan) proportionally distributes across the four national borders, but
It then represents the output, geometrically (as in, the geometry column), as just the original four countries, now with an additional non-spatial property: the Kurdish population in that country

So, for what I’ll call a “fully-spatially-aware” interpolation, we’ll combine the two things we figured out above:

Spatial intersection between 1947 Kurdistan and the four countries (producing Turkish, Iranian, Syrian, and Iraqi Kurdistan), plus
The areal-weighted interpolation we just did above

And it turns out, all that we really need to change is which sf object we’re spatially interpolating across! Meaning, this is the above code, copied-and-pasted, with countries_sf now replaced by inter_sf:

fully_joined_sf <- st_interpolate_aw(
  krd_sf['pop'],
  inter_sf,
  extensive=TRUE
)

Warning in st_interpolate_aw.sf(krd_sf["pop"], inter_sf, extensive = TRUE):
st_interpolate_aw assumes attributes are constant or uniform over areas of x

mapview(fully_joined_sf)

Also take note of something here: how did mapview “know” to color the four regions by population (using the viridis scheme, the default in mapview!), when we didn’t specify anything for the zcol argument?

This happens because, st_interpolate_aw() is solely dedicated to computing a single piece of information: the population that results from distributing the first argument (numeric) across the second argument (an sf, with a geometry column)! It’s technically not intended to carry out a full-on join.

So, if you find yourself needing to keep all the other information along with the new areal-weighted population (which will usually be the case), just make sure you add the new areal-weighted population column in fully_joined_sf back into the sf object you passed as the second argument to st_interpolate_aw(), as a new column:

inter_sf <- inter_sf |> mutate(
  interpolated_pop = fully_joined_sf$pop
)
inter_sf

	geounit	pop_est	OBJECTID	area	Shape__Area	Shape__Length	label	geometry	krd_name	interpolated_pop
37	Turkey	83429615	1	41360557	413605942351	4362384	Kurdistan 1946	MULTIPOLYGON (((37.746 39.1…	Turkey Kurdistan	13967353
47	Syria	17070135	1	41360557	413605942351	4362384	Kurdistan 1946	MULTIPOLYGON (((41.81781 36…	Syria Kurdistan	749878
142	Iraq	39309783	1	41360557	413605942351	4362384	Kurdistan 1946	MULTIPOLYGON (((46.56862 32…	Iraq Kurdistan	5987342
143	Iran	82913906	1	41360557	413605942351	4362384	Kurdistan 1946	MULTIPOLYGON (((44.54024 39…	Iran Kurdistan	11295426

And now we can use all the other info, like the name, for the rest of our analysis! For example, when making an interactive map!

inter_sf <- inter_sf |> mutate(
  krd_name = paste0(geounit, ' Kurdistan'),
  pop_str = fmt_pop(interpolated_pop),
  krd_label = paste0(krd_name, ': ', pop_str)
)
mapview(inter_sf, zcol='interpolated_pop', label='krd_label')

Part 3: Areal-Weighted Interpolation as “Aggregation”

Part 3.1: Estimating Kurdistan Population from Country Populations

Here, we pretend we don’t know the population of Kurdistan, and try to estimate it from the four country populations, based on how much of each country POLYGONs overlap with the Kurdistan POLYGON:

countries_sf

	geounit	pop_est	geometry
37	Turkey	83429615	MULTIPOLYGON (((25.97002 40…
47	Syria	17070135	MULTIPOLYGON (((35.89268 35…
142	Iraq	39309783	MULTIPOLYGON (((42.35898 37…
143	Iran	82913906	MULTIPOLYGON (((56.18799 26…

Visualized:

countries_sf <- countries_sf |> mutate(
  pop_str = fmt_pop(pop_est),
  pop_label = paste0(geounit, ": ", pop_str)
)
mapview(countries_sf, zcol="pop_est", label="pop_label") + krd_map

krd_pop_est_sf <- st_interpolate_aw(
  countries_sf['pop_est'],
  krd_sf,
  extensive = TRUE
)

Warning in st_interpolate_aw.sf(countries_sf["pop_est"], krd_sf, extensive =
TRUE): st_interpolate_aw assumes attributes are constant or uniform over areas
of x

krd_sf <- krd_sf |> mutate(
  pop_est = krd_pop_est_sf$pop_est,
  pop_str = fmt_pop(pop_est),
  geounit = "Kurdistan",
  pop_label = paste0("Kurdistan Est: ", pop_str)
)
krd_sf

OBJECTID	area	Shape__Area	Shape__Length	geometry	label	pop	pop_str	pop_est	geounit	pop_label
1	41360557	413605942351	4362384	POLYGON ((37.69528 39.21682…	Kurdistan 1946	3.2e+07	34,607,914	34607915	Kurdistan	Kurdistan Est: 34,607,914

And, visualized!

mapview(countries_sf, zcol="pop_est", label="pop_label") + mapview(krd_sf, zcol="pop_est", label="pop_label")

We can also just combine our estimated Kurdistan data with the country data into one, big sf object with 5 rows, where we just have to keep in mind that we’ve technically changed the unit of observation in each row from “UN-recognized country” to essentially “Geopolitical entity of some sort”

krd_sub_sf <- krd_sf |> select(
  geounit, pop_est, pop_str, pop_label
)
pol_ent_sf <- rbind(countries_sf, krd_sub_sf)
pol_ent_sf

	geounit	pop_est	pop_str	pop_label	geometry
37	Turkey	83429615	83,429,615	Turkey: 83,429,615	MULTIPOLYGON (((25.97002 40…
47	Syria	17070135	17,070,135	Syria: 17,070,135	MULTIPOLYGON (((35.89268 35…
142	Iraq	39309783	39,309,783	Iraq: 39,309,783	MULTIPOLYGON (((42.35898 37…
143	Iran	82913906	82,913,906	Iran: 82,913,906	MULTIPOLYGON (((56.18799 26…
1	Kurdistan	34607915	34,607,914	Kurdistan Est: 34,607,914	POLYGON ((37.69528 39.21682…

And now, with a single mapview call:

mapview(pol_ent_sf, zcol='pop_est', label='pop_label')

Part 4: Calibrating Two Grids

There is a “Northern” Kurdish language, Kurmanji, as well as a “Central” Kurdish language called Sorani¹ While it’s a bit difficult (though, very possible!) to get POLYGONs for language groups, there are standardized datasets like Glottolog and WALS that we can use to get at least the centroid of where the language is spoken, as a POINT:

kl_df <- tibble::tribble(
  ~name, ~lat, ~lon,
  # "Northern" Kurdish language: Kurmanji
  "Kurmanji Centroid", 37.00, 43.00,
  # "Central" Kurdish language: Sorani
  "Sorani Centroid", 35.65, 45.81
)
# sf for language centroids
kl_cent_sf <- st_as_sf(
  kl_df,
  coords=c('lon','lat'),
  crs=4326
)
(cent_map <- mapview(kl_cent_sf))

So, let’s use Voronoi tessellation to split Kurdistan into two sections, based on whether each point is closer to the Kurmanji or Sorani centroid!

krd_voronoi <- st_voronoi(
  st_union(kl_cent_sf),
  envelope=st_as_sfc(st_bbox(krd_sf))
)

Warning in st_voronoi.sfc(st_union(kl_cent_sf), envelope =
st_as_sfc(st_bbox(krd_sf))): st_voronoi does not correctly triangulate
longitude/latitude data

krd_extracted <- st_collection_extract(krd_voronoi, "POLYGON")
#mapview(krd_extracted)

krd_inter_sf <- st_sf(geometry = krd_extracted) |>
  st_intersection(krd_sf)

Warning: attribute variables are assumed to be spatially constant throughout
all geometries

krd_inter_sf$label <- c("Kurmanji Speakers", "Sorani Speakers")
mapview(krd_inter_sf, zcol='label') + cent_map

#northern_sf <- krd_voronoi[1,]
#southern_sf <- krd_voronoi[2,]
# mapview(northern_sf) + mapview(southern_sf)
#mapview(st_intersection(st_cast(krd_voronoi),krd_sf))

So now our job becomes: can we re-do the above estimate of Kurdistan’s population, but this time derive two separate estimates, where the original estimate is “chopped” into two pieces on the basis of the distribution of area between Kurmanji and Sorani…

The answer is: yes! That’s also the job of st_interpolate_aw() 😎

lang_pop_sf <- st_interpolate_aw(
  countries_sf['pop_est'],
  krd_inter_sf,
  extensive=TRUE
)

Warning in st_interpolate_aw.sf(countries_sf["pop_est"], krd_inter_sf,
extensive = TRUE): st_interpolate_aw assumes attributes are constant or uniform
over areas of x

mapview(lang_pop_sf)

And, adding back into our krd_inter_sf object so we have all the data “carried over”:

krd_inter_sf <- krd_inter_sf |> mutate(
  pop_est = lang_pop_sf$pop_est,
  pop_str = fmt_pop(pop_est),
  pop_label = paste0(label, ": ", pop_str)
)
mapview(krd_inter_sf, zcol='pop_est', label='pop_label')

Wohoo!

References

Hanson, Gianna, and Lanah Hinsdale. 2023. “Putting Kurdistan on the Map.” ArcGIS StoryMaps.

McDowall, David. 2021. A Modern History of the Kurds. Bloomsbury Academic.

Footnotes

There is also a “Southern” Kurdish language called Xwarin ↩︎