ESRI Type: Feature
ESRI Properties: {'OBJECTID': 1, 'area': 41360557.05027756, 'Shape__Area': 413605942350.6803, 'Shape__Length': 4362383.882545892}
Geometry Type: Polygon
First Coordinate: [37.6952778, 39.2168218]
PPOL 6805 Week 6 Lab: Interpolating Kurdistan
Part 1: Overview and Data Loading
Historical Kurdistan Geoms from ESRI
First, check out this amazing ESRI StoryMap, which we’ll cite as Hanson and Hinsdale (2023). If you’re interested, you can read a bunch more about the modern history of the Kurds in McDowall (2021).
For the sake of this assignment, though, we’ll focus in on the following 1946 map, which Jeff will give the fuller details of (condensing McDowall (2021) into less than 5 minutes, hopefully!) in class:
From ESRI StoryMaps to sf
Objects
Digging into the code using Developer Tools reveals the magical API endpoint: https://services.arcgis.com/3xOwF6p0r7IHIjfn/arcgis/rest/services/Kurdistan1947Boundaries
We could wrestle with ESRI’s API, but, there’s an even more fun way, which I learned about via this blog post. Reading that will lead you on to the pyesridump
library, which I use within Python to download the layers of that StoryMap as .geojson
files.
The details of how pyesridump
converts ESRI StoryMap Features into “standard” WKT Geometries can be fun to learn but are a bit outside of the scope of this writeup. But, from the following output you can see that ESRI Features can contain the WKT geometry types we’ve seen (like POINT
, LINESTRING
, or POLYGON
), and in this case the feature we download does indeed correspond to a collection of POLYGON
s (here we print just its first coordinate):
The remainder of the lab uses these files, along with rnaturalearth
!
Now Modern Border Geoms
library(tidyverse) |> suppressPackageStartupMessages()
library(sf) |> suppressPackageStartupMessages()
library(rnaturalearth) |> suppressPackageStartupMessages()
library(mapview) |> suppressPackageStartupMessages()
<- c(
country_list "Turkey",
"Iran",
"Iraq",
"Syria"
)<- rnaturalearth::ne_countries(
countries_sf scale=50,
type="countries",
country=country_list
)
From among allll the variables in countries_sf
, we’re only going to need geounit
and (later) pop_est
!
<- countries_sf |>
countries_sf select(geounit, pop_est)
<- mapview(
countries_map
countries_sf,zcol='geounit'
) countries_map
And now the 1946 .geojson
file, from the ESRI StoryMap, as discussed above!
print(getwd())
[1] "/Users/jpj/gtown-local/ppol6805/writeups/interpolation"
if (endsWith(getwd(), "ppol6805")) {
setwd("./writeups/interpolation/")
}<- sf::st_read("data/krd_46.geojson") |>
krd_sf mutate(label="Kurdistan 1946")
Reading layer `krd_46' from data source
`/Users/jpj/gtown-local/ppol6805/writeups/interpolation/data/krd_46.geojson'
using driver `GeoJSON'
Simple feature collection with 1 feature and 4 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 36.25249 ymin: 31.58789 xmax: 50.9462 ymax: 40.05815
Geodetic CRS: WGS 84
<- mapview(krd_sf, zcol='label')
krd_map krd_map
+ krd_map countries_map
Part 2: Areal-Weighted Interpolation for “Division”
First: Splitting KRD
Intersects as the default predicate… not very helpful here!
<- countries_sf[krd_sf,]
default_sf mapview(default_sf)
But, intersection, via st_intersection()
, does give us what we want!
# inter_sf <- st_as_sf(st_union(st_intersection(countries_sf, krd_sf)))
<- st_intersection(countries_sf, krd_sf) inter_sf
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
mapview(inter_sf)
New Name Column
<- inter_sf |> mutate(
inter_sf krd_name = paste0(geounit, " Kurdistan")
)
Now let’s view the four rows of the sf
object individually:
mapview(inter_sf[1,], zcol='krd_name')
mapview(inter_sf[2,], zcol='krd_name')
mapview(inter_sf[3,], zcol='krd_name')
mapview(inter_sf[4,], zcol='krd_name')
…And All at Once
mapview(inter_sf, zcol='krd_name')
Very cool, but, we haven’t taken population into account at all, which was the whole point of this exercise… spatial join / interpolation time!
Spatial Join / Interpolation
Again drawing on McDowall (2021) (this table is on page 4), we can estimate that the overall population across the four modern countries is approximately 32 million:
Country | % of Country | Count |
---|---|---|
Turkey | 18 | 15 million |
Iran | 10 | 8 million |
Iraq | 18 | 7.2 million |
Syria | 9 | 1.8 million |
Diaspora and Caucasus | - | 2 million |
Total | - | 34 million |
So, given this estimate, we can now “start over”, this time with the goal of keeping track of the population as we do our operations:
<- function(x) formatC(x,big.mark=",",format='d')
fmt_pop <- krd_sf |> mutate(
krd_sf pop = 32000000,
pop_str = fmt_pop(pop)
)mapview(krd_sf, zcol='pop_str')
Now, if we just use st_intersection()
… it doesn’t know that it should pay attention to pop
(even if it did know that, it wouldn’t know exactly how you want to split it up… we’ll get there)
<- st_intersection(countries_sf, krd_sf) |>
pop_intersected_sf mutate(
pop_str = fmt_pop(pop),
map_label = paste0(geounit, ' ', pop_str)
)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
mapview(pop_intersected_sf, zcol='pop', label='map_label')
So… what do we do? The binary operations we’ve learned so far won’t cut it, sadly.
So, we need to cap off the binary operations portion of the class with the fanciest operation we’ll need: spatial joins, though more specifically we’ll want an interpolation. The function in sf
is called st_interpolate_aw()
.
Let’s add that to our toolkit and get to spatial statistics/data science!
<- st_interpolate_aw(
pop_interpolated_sf 'pop'],
krd_sf[
countries_sf,extensive=TRUE
)
Warning in st_interpolate_aw.sf(krd_sf["pop"], countries_sf, extensive = TRUE):
st_interpolate_aw assumes attributes are constant or uniform over areas of x
pop_interpolated_sf
pop | geometry |
---|---|
13967353 | MULTIPOLYGON (((25.97002 40… |
749878 | MULTIPOLYGON (((35.89268 35… |
5987342 | MULTIPOLYGON (((42.35898 37… |
11295426 | MULTIPOLYGON (((56.18799 26… |
And let’s visualize it!
<- pop_interpolated_sf |> mutate(
pop_interpolated_sf pop_rounded = round(pop, 0),
pop_str = fmt_pop(pop_rounded)
)mapview(pop_interpolated_sf, zcol='pop', label='pop_str')
Cool, this tells us, if we didn’t create a Kurdistan, and just drew the borders of the four states (hypothetically…….), how many Kurds would be in each country (again, assuming uniform distribution across 1947 Kurdistan!)
pop_interpolated_sf
pop | geometry | pop_rounded | pop_str |
---|---|---|---|
13967353 | MULTIPOLYGON (((25.97002 40… | 13967353 | 13,967,353 |
749878 | MULTIPOLYGON (((35.89268 35… | 749878 | 749,878 |
5987342 | MULTIPOLYGON (((42.35898 37… | 5987342 | 5,987,342 |
11295426 | MULTIPOLYGON (((56.18799 26… | 11295426 | 11,295,426 |
$pop |> sum() |> fmt_pop() pop_interpolated_sf
[1] "32,000,000"
Now Interpolating Over the Intersection
I think of the above as kind of a… “partially-spatially-aware” interpolation. This isn’t a great description, but, what I’m getting at is the fact that:
- It does successfully compute how our population (of 1947 Kurdistan) proportionally distributes across the four national borders, but
- It then represents the output, geometrically (as in, the
geometry
column), as just the original four countries, now with an additional non-spatial property: the Kurdish population in that country
So, for what I’ll call a “fully-spatially-aware” interpolation, we’ll combine the two things we figured out above:
- Spatial intersection between 1947 Kurdistan and the four countries (producing Turkish, Iranian, Syrian, and Iraqi Kurdistan), plus
- The areal-weighted interpolation we just did above
And it turns out, all that we really need to change is which sf
object we’re spatially interpolating across! Meaning, this is the above code, copied-and-pasted, with countries_sf
now replaced by inter_sf
:
<- st_interpolate_aw(
fully_joined_sf 'pop'],
krd_sf[
inter_sf,extensive=TRUE
)
Warning in st_interpolate_aw.sf(krd_sf["pop"], inter_sf, extensive = TRUE):
st_interpolate_aw assumes attributes are constant or uniform over areas of x
mapview(fully_joined_sf)
Also take note of something here: how did mapview
“know” to color the four regions by population (using the viridis scheme, the default in mapview
!), when we didn’t specify anything for the zcol
argument?
This happens because, st_interpolate_aw()
is solely dedicated to computing a single piece of information: the population that results from distributing the first argument (numeric) across the second argument (an sf
, with a geometry
column)! It’s technically not intended to carry out a full-on join.
So, if you find yourself needing to keep all the other information along with the new areal-weighted population (which will usually be the case), just make sure you add the new areal-weighted population column in fully_joined_sf
back into the sf
object you passed as the second argument to st_interpolate_aw()
, as a new column:
<- inter_sf |> mutate(
inter_sf interpolated_pop = fully_joined_sf$pop
) inter_sf
geounit | pop_est | OBJECTID | area | Shape__Area | Shape__Length | label | geometry | krd_name | interpolated_pop | |
---|---|---|---|---|---|---|---|---|---|---|
37 | Turkey | 83429615 | 1 | 41360557 | 413605942351 | 4362384 | Kurdistan 1946 | MULTIPOLYGON (((37.746 39.1… | Turkey Kurdistan | 13967353 |
47 | Syria | 17070135 | 1 | 41360557 | 413605942351 | 4362384 | Kurdistan 1946 | MULTIPOLYGON (((41.81781 36… | Syria Kurdistan | 749878 |
142 | Iraq | 39309783 | 1 | 41360557 | 413605942351 | 4362384 | Kurdistan 1946 | MULTIPOLYGON (((46.56862 32… | Iraq Kurdistan | 5987342 |
143 | Iran | 82913906 | 1 | 41360557 | 413605942351 | 4362384 | Kurdistan 1946 | MULTIPOLYGON (((44.54024 39… | Iran Kurdistan | 11295426 |
And now we can use all the other info, like the name, for the rest of our analysis! For example, when making an interactive map!
<- inter_sf |> mutate(
inter_sf krd_name = paste0(geounit, ' Kurdistan'),
pop_str = fmt_pop(interpolated_pop),
krd_label = paste0(krd_name, ': ', pop_str)
)mapview(inter_sf, zcol='interpolated_pop', label='krd_label')
Part 3: Areal-Weighted Interpolation as “Aggregation”
Part 3.1: Estimating Kurdistan Population from Country Populations
Here, we pretend we don’t know the population of Kurdistan, and try to estimate it from the four country populations, based on how much of each country POLYGON
s overlap with the Kurdistan POLYGON
:
countries_sf
geounit | pop_est | geometry | |
---|---|---|---|
37 | Turkey | 83429615 | MULTIPOLYGON (((25.97002 40… |
47 | Syria | 17070135 | MULTIPOLYGON (((35.89268 35… |
142 | Iraq | 39309783 | MULTIPOLYGON (((42.35898 37… |
143 | Iran | 82913906 | MULTIPOLYGON (((56.18799 26… |
Visualized:
<- countries_sf |> mutate(
countries_sf pop_str = fmt_pop(pop_est),
pop_label = paste0(geounit, ": ", pop_str)
)mapview(countries_sf, zcol="pop_est", label="pop_label") + krd_map
<- st_interpolate_aw(
krd_pop_est_sf 'pop_est'],
countries_sf[
krd_sf,extensive = TRUE
)
Warning in st_interpolate_aw.sf(countries_sf["pop_est"], krd_sf, extensive =
TRUE): st_interpolate_aw assumes attributes are constant or uniform over areas
of x
<- krd_sf |> mutate(
krd_sf pop_est = krd_pop_est_sf$pop_est,
pop_str = fmt_pop(pop_est),
geounit = "Kurdistan",
pop_label = paste0("Kurdistan Est: ", pop_str)
) krd_sf
OBJECTID | area | Shape__Area | Shape__Length | geometry | label | pop | pop_str | pop_est | geounit | pop_label |
---|---|---|---|---|---|---|---|---|---|---|
1 | 41360557 | 413605942351 | 4362384 | POLYGON ((37.69528 39.21682… | Kurdistan 1946 | 3.2e+07 | 34,607,914 | 34607915 | Kurdistan | Kurdistan Est: 34,607,914 |
And, visualized!
mapview(countries_sf, zcol="pop_est", label="pop_label") + mapview(krd_sf, zcol="pop_est", label="pop_label")
We can also just combine our estimated Kurdistan data with the country data into one, big sf
object with 5 rows, where we just have to keep in mind that we’ve technically changed the unit of observation in each row from “UN-recognized country” to essentially “Geopolitical entity of some sort”
<- krd_sf |> select(
krd_sub_sf
geounit, pop_est, pop_str, pop_label
)<- rbind(countries_sf, krd_sub_sf)
pol_ent_sf pol_ent_sf
geounit | pop_est | pop_str | pop_label | geometry | |
---|---|---|---|---|---|
37 | Turkey | 83429615 | 83,429,615 | Turkey: 83,429,615 | MULTIPOLYGON (((25.97002 40… |
47 | Syria | 17070135 | 17,070,135 | Syria: 17,070,135 | MULTIPOLYGON (((35.89268 35… |
142 | Iraq | 39309783 | 39,309,783 | Iraq: 39,309,783 | MULTIPOLYGON (((42.35898 37… |
143 | Iran | 82913906 | 82,913,906 | Iran: 82,913,906 | MULTIPOLYGON (((56.18799 26… |
1 | Kurdistan | 34607915 | 34,607,914 | Kurdistan Est: 34,607,914 | POLYGON ((37.69528 39.21682… |
And now, with a single mapview
call:
mapview(pol_ent_sf, zcol='pop_est', label='pop_label')
Part 4: Calibrating Two Grids
There is a “Northern” Kurdish language, Kurmanji, as well as a “Central” Kurdish language called Sorani1 While it’s a bit difficult (though, very possible!) to get POLYGON
s for language groups, there are standardized datasets like Glottolog and WALS that we can use to get at least the centroid of where the language is spoken, as a POINT
:
<- tibble::tribble(
kl_df ~name, ~lat, ~lon,
# "Northern" Kurdish language: Kurmanji
"Kurmanji Centroid", 37.00, 43.00,
# "Central" Kurdish language: Sorani
"Sorani Centroid", 35.65, 45.81
)# sf for language centroids
<- st_as_sf(
kl_cent_sf
kl_df,coords=c('lon','lat'),
crs=4326
)<- mapview(kl_cent_sf)) (cent_map
So, let’s use Voronoi tessellation to split Kurdistan into two sections, based on whether each point is closer to the Kurmanji or Sorani centroid!
<- st_voronoi(
krd_voronoi st_union(kl_cent_sf),
envelope=st_as_sfc(st_bbox(krd_sf))
)
Warning in st_voronoi.sfc(st_union(kl_cent_sf), envelope =
st_as_sfc(st_bbox(krd_sf))): st_voronoi does not correctly triangulate
longitude/latitude data
<- st_collection_extract(krd_voronoi, "POLYGON")
krd_extracted #mapview(krd_extracted)
<- st_sf(geometry = krd_extracted) |>
krd_inter_sf st_intersection(krd_sf)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
$label <- c("Kurmanji Speakers", "Sorani Speakers")
krd_inter_sfmapview(krd_inter_sf, zcol='label') + cent_map
#northern_sf <- krd_voronoi[1,]
#southern_sf <- krd_voronoi[2,]
# mapview(northern_sf) + mapview(southern_sf)
#mapview(st_intersection(st_cast(krd_voronoi),krd_sf))
So now our job becomes: can we re-do the above estimate of Kurdistan’s population, but this time derive two separate estimates, where the original estimate is “chopped” into two pieces on the basis of the distribution of area between Kurmanji and Sorani…
The answer is: yes! That’s also the job of st_interpolate_aw()
😎
<- st_interpolate_aw(
lang_pop_sf 'pop_est'],
countries_sf[
krd_inter_sf,extensive=TRUE
)
Warning in st_interpolate_aw.sf(countries_sf["pop_est"], krd_inter_sf,
extensive = TRUE): st_interpolate_aw assumes attributes are constant or uniform
over areas of x
mapview(lang_pop_sf)
And, adding back into our krd_inter_sf
object so we have all the data “carried over”:
<- krd_inter_sf |> mutate(
krd_inter_sf pop_est = lang_pop_sf$pop_est,
pop_str = fmt_pop(pop_est),
pop_label = paste0(label, ": ", pop_str)
)mapview(krd_inter_sf, zcol='pop_est', label='pop_label')
Wohoo!