library(sf)
library(svglite)
svglite("images/st_polygon.svg", width = 6, height = 4.5)
<- st_polygon(
poly_blob list(
rbind(c(2,1), c(3,1), c(5,2), c(6,3), c(5,3), c(4,4), c(3,4), c(1,3), c(2,1)),
rbind(c(2,2), c(3,3), c(4,3), c(4,2), c(2,2))
)
)plot(poly_blob,
border = 'black', col = '#ff8888', lwd = 4
)dev.off()
Week 1: Introduction to GIS
PPOL 6805 / DSAN 6750: GIS for Spatial Data Science
Fall 2024
Welcome to The Wonderful World of GIS!
Your Final Project
Unit 1: Maps
- Your least favorite part of the course (per survey 😜)
- My favorite part of the course (because I love overthinking things)
- My goal given survey results: Let’s think of this unit like learning languages for expressing spatial information:
Temporal Information | Spatial Information |
---|---|
\(\Rightarrow\) 22.5 seconds |
\(\Rightarrow\) POLYGON ((2 1, 3 1, 5 2, 6 3, 5 3, 4 4, 3 4, 1 3, 2 1),(2 2, 3 3, 4 3, 4 2, 2 2)) |
- I think you’ll be surprised at how, complexity of geospatial/spatio-temporal data \(\implies\) need for programming-language-independent representations
Unit 2: Using Code to Make Maps
- (More on this in Prereqs section below!)
- Given representations from Part 1, the task of coding becomes task of finding “best” library for loading/manipulating/plotting them
- Where “best” = best for you!
- In R:
sf
and friends (tidyverse
) - In Python:
geopandas
Unit 3: Spatial Data Science
- Drawing inferences about spatial phenomena
- The meat of the course
- How can we write code (Unit 2) to analyze a map (Unit 1) so as to…
- Discover patterns (EDA: Exploratory Data Analysis) or
- Test hypotheses (CDA: Confirmatory Data Analysis)
Unit 4: Applications / Final Project
- Take everything you’ve learned in Units 1-3 and Kamehameha them onto something you care about in the world!
- Public Policy: Which counties are most in need of more transportation infrastructure?
- Urban Planning: Which neighborhoods are most in need of a new bus stop?
- Epidemiology: What properties of a region make it more/less susceptible to infectious diseases? Where should we intervene to “cut the chain” of a disease vector?
Who Am I? Why Am I Teaching You?
- Started out as PhD student in Computer Science
- UCLA: Algorithmic Game Theory
- Stanford (MS): Economic Network Analysis
- Ended up with PhD in Political Economy
- Columbia: “Computational Political Theory”
My GIS Adventures
- High school project: mine defusal in Indochina
- As a Telecommunications Engineer for Huawei (HKUST)
- As an Urban Economist at UC Berkeley
- Used, e.g., Google Maps API to evaluate effects of Suburbanization of Poverty
My GIS 🤯 Moment
- Horrors of “Vietnam War” did not end in 1975… Casualties from unexploded ordnance (cluster bombs) continue to devastate the region, over 220,000 victims:
Huawei: Optimizing Cell Tower Placement
The Suburbanization of Poverty
- Since 2008, a person living in poverty in the US is more likely to be in a suburb than an “inner city”
- What does this mean for…
- Access to Food / Public Services?
- Finding a job \(\leadsto\) Commuting?
- My job: computing “suburban accessibility indices”
- Does commuting = straight line distance?
“Distance” vs. Distance!
- You’ve just been hired as a fine art curator at The Whitney… Congratulations!
Commuting 1 mile to the Whitney | |
Also commuting 1 mile to the Whitney |
Why Should You Care About GIS?
- As a Human
- As a Data Scientist
- As a Public Policy Expert
As Humans
- To understand the world around you!
- \(\implies\) Crucial landmark in the genesis of social science
As Data Scientists
- All data scientists are expected to know how to analyze “standard” types of data: tabular, numeric data (think spreadsheets)
- However, you can differentiate yourself in the scary scary job market by developing a particular focus on some “non-standard” type:
Hello Mr. Google Musk, yes, indeed, I have a wealth of experience working with [text data / temporal data / signal processing / geospatial data]. This job will be no problem for me.
As Public Policy Experts
- Oftentimes, all it takes is one map to see why a policy has failed 😱
http://www.radicalcartography.net/index.html?chicagodots, then adapted to DC: “[Eric Fisher] was astounded by Bill Rankin’s map of Chicago’s racial and ethnic divides and wanted to see what other cities looked like mapped the same way. To match his map, Red is White, Blue is Black, Green is Asian, Orange is Hispanic, Gray is Other, and each dot is 25 people. Data from Census 2000. Base map © OpenStreetMap, CC-BY-SA” https://commons.wikimedia.org/wiki/File:Race_and_ethnicity_map_of_Washington,_D.C..png
So… What Is GIS?
It’s Completely Made Up
Like, even more made up than other made-up technical terms… 😵💫
What I Mean By “Made Up”
- The libraries and tools we’ll use are specific systems/methods for analyzing geospatial data
- GIS is an “umbrella term”, which just vaguely refers to this entire universe of libraries/tools/techniques/approaches
Umbrella Term | Concepts | Specific Skills |
---|---|---|
Coding |
|
|
GIS |
|
|
ArcGIS
- For info on Georgetown’s provision of ArcGIS (Online, Pro, and Desktop), see the Library Guide
Then… Why Can’t We Just Use ArcGIS?
Analogy from non-geospatial data science:
Text Drawn Map |
→ | Speadsheet Digital Map |
→ | Equations Maps w/ArcGIS |
→ | Code This Class |
||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Start writing
info.txt
|
Start entering info in rows
|
Start using equations
|
Write code plot_balance.py
Profit 💲💰🤑💰💲 |
The Spatial Data Science Universe
Course Policy Things
- How To Not Be Scared of Prerequisites
- ChatGPT
- Learning How To Learn
Pedagogical Principles
- There’s literally no such thing as “intelligence”
- Anyone is capable of learning anything (neural plasticity)
- Growth mindset: “I can’t do this” \(\leadsto\) “I can’t do this yet!”
- The point of a class is learning: understanding something about the world, either (a) For its own sake (end in itself) or (b) Because it’s relevant to something you care about (means to an end)
Our teaching should be governed, not by a desire to make students learn things, but by the endeavor to keep burning within them that light which is called curiosity. (Montessori 1916)
ChatGPT and Whatnot
- If you feel like ChatGPT will help you learn something in the course, then use it!
- If you feel like you’re using it as a “crutch”, try to hold yourself accountable for not using it!
Take the time/energy you're using to worry about... | Use it instead to worry about... |
---|---|
|
Learning GIS |
I Am The Opposite of a Prereq-Stickler
- I genuinely believe that I can make the course accessible to you, meeting you wherever you’re at, no matter what!
- Everyone learns at their own pace (who says 14 weeks is “correct” amount of time to learn GIS?), and I structure my courses as best as I possibly can to adapt to your pace
- \(\Rightarrow\) Assessments (HW, Midterm) valuable in two ways:
- [Valuable for you] As an accountability mechanism to make sure you’re learn the material (how do we know when we’ve learned something? When we can answer questions about it / use it to accomplish things!)
- [Valuable for me] For assessing and updating pace
R and/or Python and/or JS
- My Geometry vs. Algebra Rant… Euclid’s Elements, Book VI, Proposition 28.
- The problem: Divide a given straight line so that the rectangle contained by its segments may be equal to a given area, not exceeding the square of half the line.
Geometers solved w/geometry (300 BC)…
…Algebraists solved w/algebra (2000 BC)…
\[ \begin{align*} &ax^2 + bx + c = 0 \\ \Rightarrow \; & x_+ = \frac{-b + \sqrt{b^2 - 4ac}}{2a} \end{align*} \]
…From 1637 onwards, whichever is easier! 🤯🤯🤯 (Isomorphism)
Learning How To Learn
He’s Literally Extremely Correct!
Let’s Make Some Dang Maps!
Our First Map: Polygons!
(Quick demo adapted from Sherry Xie’s R Consortium Workshop: Analyzing Geospatial Data in R, using DC rather than Philadelphia open data.)
Code
library(sf)
# Load DC tracts data
<- "data/DC_Census_2020/Census_Tracts_in_2020.shp"
dc_sf_fpath <- st_read(dc_sf_fpath); dc_sf
Reading layer `Census_Tracts_in_2020' from data source
`/Users/jpj/gtown-local/ppol6805/w01/data/DC_Census_2020/Census_Tracts_in_2020.shp'
using driver `ESRI Shapefile'
Simple feature collection with 206 features and 315 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -8584933 ymin: 4691871 xmax: -8561515 ymax: 4721078
Projected CRS: WGS 84 / Pseudo-Mercator
Code
<- c("OBJECTID", "TRACT", "GEOID", "ALAND", "AWATER", "STUSAB", "SUMLEV", "GEOCODE", "STATE", "NAME", "POP100", "HU100", "geometry")
cols_to_keep <- dc_sf |> select(cols_to_keep) dc_sf
Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
# Was:
data %>% select(cols_to_keep)
# Now:
data %>% select(all_of(cols_to_keep))
See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
What is an sf
Object?
dc_sf
is an object of typesf
(short for “simple feature”), which extendsdata.frame
, and contains features which have typePOLYGON
class(dc_sf)
[1] "sf" "data.frame"
head(dc_sf)
Simple feature collection with 6 features and 12 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -8577962 ymin: 4708107 xmax: -8572564 ymax: 4716136
Projected CRS: WGS 84 / Pseudo-Mercator
OBJECTID TRACT GEOID ALAND AWATER STUSAB SUMLEV GEOCODE STATE
1 1 002002 11001002002 849376 0 DC 140 11001002002 11
2 2 002101 11001002101 600992 0 DC 140 11001002101 11
3 3 002102 11001002102 725975 0 DC 140 11001002102 11
4 4 002201 11001002201 415173 0 DC 140 11001002201 11
5 5 002202 11001002202 698895 566 DC 140 11001002202 11
6 6 000101 11001000101 199776 5261 DC 140 11001000101 11
NAME POP100 HU100 geometry
1 Census Tract 20.02 4072 1532 POLYGON ((-8575655 4714476,...
2 Census Tract 21.01 5687 2335 POLYGON ((-8574745 4715676,...
3 Census Tract 21.02 5099 2221 POLYGON ((-8573824 4715684,...
4 Census Tract 22.01 3485 1229 POLYGON ((-8574654 4714781,...
5 Census Tract 22.02 3339 1454 POLYGON ((-8573792 4714811,...
6 Census Tract 1.01 1406 999 POLYGON ((-8577962 4708867,...
Working With sf
Objects
- With some rare but important exceptions (which we’ll learn!), can be used just like a
data.frame
/tibble
:
Code
str(dc_sf) # view structure
Classes 'sf' and 'data.frame': 206 obs. of 13 variables:
$ OBJECTID: int 1 2 3 4 5 6 7 8 9 10 ...
$ TRACT : chr "002002" "002101" "002102" "002201" ...
$ GEOID : chr "11001002002" "11001002101" "11001002102" "11001002201" ...
$ ALAND : int 849376 600992 725975 415173 698895 199776 1706484 505004 776435 1042157 ...
$ AWATER : int 0 0 0 0 566 5261 516665 0 439661 2305 ...
$ STUSAB : chr "DC" "DC" "DC" "DC" ...
$ SUMLEV : int 140 140 140 140 140 140 140 140 140 140 ...
$ GEOCODE : chr "11001002002" "11001002101" "11001002102" "11001002201" ...
$ STATE : int 11 11 11 11 11 11 11 11 11 11 ...
$ NAME : chr "Census Tract 20.02" "Census Tract 21.01" "Census Tract 21.02" "Census Tract 22.01" ...
$ POP100 : int 4072 5687 5099 3485 3339 1406 3417 4108 4672 6161 ...
$ HU100 : int 1532 2335 2221 1229 1454 999 2053 11 2169 2845 ...
$ geometry:sfc_POLYGON of length 206; first list element: List of 1
..$ : num [1:155, 1:2] -8575655 -8575655 -8575655 -8575655 -8575655 ...
..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "names")= chr [1:12] "OBJECTID" "TRACT" "GEOID" "ALAND" ...
Working With sf
Objects
Code
head(dc_sf) # view first several rows
Simple feature collection with 6 features and 12 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -8577962 ymin: 4708107 xmax: -8572564 ymax: 4716136
Projected CRS: WGS 84 / Pseudo-Mercator
OBJECTID TRACT GEOID ALAND AWATER STUSAB SUMLEV GEOCODE STATE
1 1 002002 11001002002 849376 0 DC 140 11001002002 11
2 2 002101 11001002101 600992 0 DC 140 11001002101 11
3 3 002102 11001002102 725975 0 DC 140 11001002102 11
4 4 002201 11001002201 415173 0 DC 140 11001002201 11
5 5 002202 11001002202 698895 566 DC 140 11001002202 11
6 6 000101 11001000101 199776 5261 DC 140 11001000101 11
NAME POP100 HU100 geometry
1 Census Tract 20.02 4072 1532 POLYGON ((-8575655 4714476,...
2 Census Tract 21.01 5687 2335 POLYGON ((-8574745 4715676,...
3 Census Tract 21.02 5099 2221 POLYGON ((-8573824 4715684,...
4 Census Tract 22.01 3485 1229 POLYGON ((-8574654 4714781,...
5 Census Tract 22.02 3339 1454 POLYGON ((-8573792 4714811,...
6 Census Tract 1.01 1406 999 POLYGON ((-8577962 4708867,...
Working With sf
Objects
Code
dim(dc_sf) # view dimensions
[1] 206 13
Code
1,] # select first row dc_sf[
Simple feature collection with 1 feature and 12 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -8575656 ymin: 4713958 xmax: -8574562 ymax: 4716136
Projected CRS: WGS 84 / Pseudo-Mercator
OBJECTID TRACT GEOID ALAND AWATER STUSAB SUMLEV GEOCODE STATE
1 1 002002 11001002002 849376 0 DC 140 11001002002 11
NAME POP100 HU100 geometry
1 Census Tract 20.02 4072 1532 POLYGON ((-8575655 4714476,...
Working With sf
Objects
Code
head(dc_sf$NAME) # select column by name
[1] "Census Tract 20.02" "Census Tract 21.01" "Census Tract 21.02"
[4] "Census Tract 22.01" "Census Tract 22.02" "Census Tract 1.01"
Code
head(dc_sf[,4]) # select column by number
Simple feature collection with 6 features and 1 field
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -8577962 ymin: 4708107 xmax: -8572564 ymax: 4716136
Projected CRS: WGS 84 / Pseudo-Mercator
ALAND geometry
1 849376 POLYGON ((-8575655 4714476,...
2 600992 POLYGON ((-8574745 4715676,...
3 725975 POLYGON ((-8573824 4715684,...
4 415173 POLYGON ((-8574654 4714781,...
5 698895 POLYGON ((-8573792 4714811,...
6 199776 POLYGON ((-8577962 4708867,...
And… Actually Displaying the Map!
Code
# We can extract the geometry with the st_geometry function
<- st_geometry(dc_sf)
dc_geo #pt_geo
# Plot the geometry with base R's plot() function
plot(dc_geo)