Week 6: Random Fields and Spatial Autocorrelation

PPOL 6805 / DSAN 6750: GIS for Spatial Data Science
Fall 2024

Class Sessions

Author

Affiliation

Jeff Jacobs

jj1088@georgetown.edu

Published

Wednesday, October 2, 2024

Open slides in new tab →

Random Fields

(Now on slides rather than scribbled on the whiteboard!)

\(D\) = Rectangle; \(Z(\mathbf{s})\) here: **height** at coordinate \(\mathbf{s} \in D\)

Key Notation / Definition

\(d\)-Dimensional “Spatial Process” (Schabenberger and Gotway 2004, 6)

\[ \text{Data} = \left\{Z(\mathbf{s}) \mid \mathbf{s} \in D \subset \mathbb{R}^d\right\} \]

\(d > 1\): Data forms a Random Field (this class: \(d = 2\)!)

	Geostatistical Data	Lattice/Region Data	Point Pattern
Criteria	Fixed \(D\), Continuous	Fixed \(D\), Discrete	Random subset \(D^* \subseteq D\)
Interest	Infer non-observed parts of \(D\)	Autocorrelation, clustering	Point-generating process
Example	\(N\) trees \(\mathbf{s}_1, \mathbf{s}_2, \ldots, \mathbf{s}_N\), observed within a sample window \(D \subset \mathbb{R}^2\), (\(D\) some finite plot of land) \(Z(\mathbf{s}_i)\): Attribute(s) at site \(\mathbf{s}_i\) Example: Height. \(Z(\mathbf{s}_1) = 500\text{m}\), \(Z(\mathbf{s}_2) = 850\text{m}\), \(\ldots\)	\(Z(\mathbf{s})\) observed over \(N \times N\) grid of plots \(\Rightarrow\) Contiguity, Neighbors (next section of slides!) \(\Rightarrow\) Autocorrelation: Are points around \(\mathbf{s}_i\) likely to have values similar to \(Z(\mathbf{s}_i)\)?	Unknown number of lightning strikes \(\mathbf{s}_1, \mathbf{s}_2, \ldots\) Contrast with geostatistical: all of \(D\) is observed, but what determines the subset \(D^\) where events* occur? “Unmarked”: Just locations “Marked”: Locations+info (e.g., intensity of strike)

Geostatistical Data: Dumping Sand onto a Table

Domain \(D\): tabletop; Random process: dumping sand onto the tabletop
\(\Omega\): All possible realizations of process. \(\omega_i \in \Omega\) Particular realization

Geostatistical Modeling: Terrains Are Not Spiky Chaotic Landscapes!

Geostatistical Data \(\rightarrow\) Point Pattern

Indicator function: \(\mathbb{1}\left[Z(\mathbf{s}) > 860\text{m}\right]\)

Code

set.seed(6805)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

library(sf)

Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

Code

library(terra)

terra 1.7.78

Attaching package: 'terra'

The following object is masked from 'package:tidyr':

    extract

Code

library(latex2exp)
#### Define area
lat_range <- c(-5.0, -6.0)
lon_range <- c(15, 20)
#lon_center <- -5.4
#lat_center <- 36.75
# And the window around this centroid
lat_radius <- 0.05
lon_radius <- 0.1

gen_random_table <- function(world_num=1, return_data=FALSE, rand_seed=NULL) {
  if (!is.null(rand_seed)) {
    set.seed(rand_seed)
  } else {
    rand_seed <- sample(1:9999, size=1)
    set.seed(rand_seed)
  }
  lat_center <- runif(1, min=min(lat_range), max=max(lat_range))
  lon_center <- runif(1, min=min(lon_range), max=max(lon_range))
  lon_lower <- lon_center - lon_radius
  lon_upper <- lon_center + lon_radius
  lat_lower <- lat_center - lat_radius
  lat_upper <- lat_center + lat_radius
  coords <- data.frame(x = c(lon_lower, lon_lower, lon_upper, lon_upper),
                       y = c(lat_lower, lat_upper, lat_lower, lat_upper))
  coords_sf <- st_as_sf(coords, coords = c("x", "y"), crs = 4326)
  # Using geodata
  elev <- geodata::elevation_3s(lon = lon_center, lat = lat_center, res=2.5, path = getwd())
  elev <- terra::crop(elev, coords_sf)
  return(elev)
}

compute_hillshade <- function(elev) {
  # Calculate hillshade
  slopes <- terra::terrain(elev, "slope", unit = "radians")
  aspect <- terra::terrain(elev, "aspect", unit = "radians")
  hs <- terra::shade(slopes, aspect)
  return(hs)
}

plot_hillshade <- function(elev, hs, title=NULL) {
  ## Plot hillshading as basemap
  # (here using terra::plot, but could use tmap)
  # lat_upper_trimmed <- lat_upper - 0.001*lat_upper
  # lat_lower_trimmed <- lat_lower - 0.001*lat_lower
  base_plot <- terra::plot(
    hs, col = gray(0:100 / 100), legend = FALSE, axes = FALSE, mar=c(0,0,1,0), grid=TRUE
  )
  #    xlim = c(elev_xmin, elev_xmax), ylim = c(elev_ymin, elev_ymax))
  # overlay with elevation
  
  color_vec <- terrain.colors(25)
  plot(elev, col = color_vec, alpha = 0.5, legend = FALSE, axes = FALSE, add = TRUE)
  
  # add contour lines
  terra::contour(elev, col = "grey30", add = TRUE)
  if (!is.null(title)) {
    world_name <- TeX(sprintf(r"(World $\omega_{%d} \in \Omega$)", world_num))
    title(world_name)
  }
}
elev_r1 <- gen_random_table(rand_seed=6810)
hs_r1 <- compute_hillshade(elev_r1)
plot_hillshade(elev_r1, hs_r1)

Code

elev_indic_r1 <- terra::ifel(
  elev_r1 > 860, elev_r1, NA
)
plot(elev_indic_r1)

Code

elev_r2 <- gen_random_table(rand_seed=6815)
hs_r2 <- compute_hillshade(elev_r2)
plot_hillshade(elev_r2, hs_r2)

Code

my_quant <- function(x) quantile(x, 0.99)
elev_99 <- terra::global(elev_r2, my_quant)[1,1]
elev_indic_r2 <- terra::ifel(
  elev_r2 > elev_99, elev_r2, NA
)
plot(elev_indic_r2)

Geospatial Data \(\rightarrow\) Lattice/Region Data

Code

table_elev <- gen_random_table(rand_seed=6805)
table_hs <- compute_hillshade(table_elev)
#plot_hillshade(table_elev, table_hs)
#par(mfrow=c(1,2), mar=c(0,0,1,1))
plot(table_elev)

(Pretend this is infinite-resolution) Infinite number of neighbors around each point!

Code

elev_coarse <- terra::aggregate(table_elev, 12)
plot(elev_coarse)

Finite number of neighbors around each observation \(\implies\) Can study correlations between *neighboring* cells!

Doing Things with `nbdep`

“Neighbor” Definitions
Testing Hypotheses About Neighborhoods!

Who Are My Neighbors?

Contiguity-Based:
Distance-Based
\(K\)-Nearest Neighbors

Distance-Based Neighbors

Spatial Autocorrelation

Location \(i\) has high value \(\implies\) locations near \(i\) more likely to have high values

Moran’s \(I\)

\[ I = \underset{\text{Inverse of Variance}}{\boxed{\frac{n}{\sum_{i=1}^{n}(y_i - \overline{y})^2}}} \frac {\overbrace{\sum_{i=1}^{n}\sum_{j=1}^{n}w_{ij}(y_i - \overline{y})(y_j - \overline{y})}^{\text{Weighted Covariance}}} {\underbrace{\sum_{i=1}^{n}\sum_{j=1}^{n}w_{ij}}_{\text{Normalize Weights}}} \]

\(I\) is Large when:
- \(y_i\) and \(y_j\) are neighbors: \(w_{ij}\), and
- \(y_i\) and \(y_j\) large at the same time: \((y_i - \overline{y})(y_j - \overline{y})\)

Local Indicators of Spatial Autocorrelation (LISA)

tldr: See how much a given location \(\mathbf{s}\) contributes to overall Moran’s \(I\)
Local Moran’s \(I\):

\[ I_i = \frac{y_i - \overline{y}}{S_i^2}\sum_{j=1}^{n}w_{ij}(y_j - \overline{y}) \]

Our Null Hypothesis: Complete Spatial Randomness (CSR)

Leaving You With a Challenge

These point patterns can be classified as clustered or regular:

Leaving You With a Challenge

References

Schabenberger, Oliver, and Carol A. Gotway. 2004. Statistical Methods for Spatial Data Analysis. CRC Press.

Xie, Sherrie. 2023. “Analyzing Geospatial Data in R.”

Random Fields

Key Notation / Definition

Geostatistical Data: Dumping Sand onto a Table

Geostatistical Modeling: Terrains Are Not Spiky Chaotic Landscapes!

Geostatistical Data \(\rightarrow\) Point Pattern

Geospatial Data \(\rightarrow\) Lattice/Region Data

Doing Things with nbdep

Who Are My Neighbors?

Distance-Based Neighbors

Spatial Autocorrelation

Moran’s \(I\)

Local Indicators of Spatial Autocorrelation (LISA)

Our Null Hypothesis: Complete Spatial Randomness (CSR)

Leaving You With a Challenge

Leaving You With a Challenge

References

Doing Things with `nbdep`