Visualizing Segregation in DC

Urban Studies

GIS

Segregation

Inequality

Author

Affiliation

Jeff Jacobs

jj1088@georgetown.edu

Published

November 27, 2023

About ten years ago, Pew Research released an incredible set of maps visualizing how extreme segregation is in DC, race-wise as well as socioeconomic. A screenshot from this old set of visualizations shows what they used to look like:

Pew’s visualization of socioeconomic segregation in DC

Unfortunately, all of these visualizations used MapBox, which seems to just totally not exist anymore (at least, these particular maps are long gone), so that when you try to view these visualizations on Pew’s website nowadays, you just get a blank page.

So, in this document, I recreate the above maps, using open-source libraries in Python to (hopefully) allow interactive visualization of this important information that will last longer than the previous versions in MapBox’s proprietary format!

Data Overview

The data behind these maps is somewhat hard to find, but in a strange way that is the opposite of most hard-to-find data cases: here there are so many different data sources for income across the “DC Metro Area” (the definition of this region, itself, being subject to different interpretations by different data sources), that I ran into the following tradeoff at the start:

If we want data for just the District of Columbia itself, we can obtain very easy-to-use data directly from the DC government’s data portal, which is ready for immediate use in the sense that we can plug it into a mapping app and see the data without any need to tweak any settings! Clicking that link, for example, will show a preview of the map directly within the GitHub page! While the GitHub preview won’t show the income data for each tract, this geojson.io link (with the URL just pointing to that GitHub page) will!
Similarly, if we want data for just Maryland or just Virginia, we could obtain fairly easy-to-use geoJSON files from these states’ data portals
But, if we want data for the DC Metro Area, allowing apples-to-apples comparisons between (for example) census tracts within DC and in the Maryland suburbs, then we run into a bit of an issue since the relevant US Census data is far less ready-for-use in its raw form.

IPUMS Data: Median Income by Census Tract

First we load the data, which contains median household income for all census tracts in the US:

import pandas as pd
ipums_df = pd.read_csv("assets/nhgis0001_ds254_20215_tract.csv", encoding_errors='ignore')
ipums_df.head()

	GISJOIN	YEAR	STUSAB	REGIONA	DIVISIONA	STATE	STATEA	COUNTY	COUNTYA	COUSUBA	...	PCI	PUMAA	GEO_ID	BTTRA	BTBGA	TL_GEO_ID	NAME_E	AOQIE001	NAME_M	AOQIM001
0	G0100010020100	2017-2021	AL	NaN	NaN	Alabama	1	Autauga County	1	NaN	...	NaN	NaN	1400000US01001020100	NaN	NaN	1001020100	Census Tract 201, Autauga County, Alabama	57399.0	Census Tract 201, Autauga County, Alabama	10706.0
1	G0100010020200	2017-2021	AL	NaN	NaN	Alabama	1	Autauga County	1	NaN	...	NaN	NaN	1400000US01001020200	NaN	NaN	1001020200	Census Tract 202, Autauga County, Alabama	52176.0	Census Tract 202, Autauga County, Alabama	5849.0
2	G0100010020300	2017-2021	AL	NaN	NaN	Alabama	1	Autauga County	1	NaN	...	NaN	NaN	1400000US01001020300	NaN	NaN	1001020300	Census Tract 203, Autauga County, Alabama	63704.0	Census Tract 203, Autauga County, Alabama	11304.0
3	G0100010020400	2017-2021	AL	NaN	NaN	Alabama	1	Autauga County	1	NaN	...	NaN	NaN	1400000US01001020400	NaN	NaN	1001020400	Census Tract 204, Autauga County, Alabama	70000.0	Census Tract 204, Autauga County, Alabama	12155.0
4	G0100010020501	2017-2021	AL	NaN	NaN	Alabama	1	Autauga County	1	NaN	...	NaN	NaN	1400000US01001020501	NaN	NaN	1001020501	Census Tract 205.01, Autauga County, Alabama	60917.0	Census Tract 205.01, Autauga County, Alabama	29232.0

5 rows × 45 columns

We can get a sense of how many Census Tracts there are across different states, before we restrict ourselves to just the DMV:

# Here you can uncomment the following to install itables,
# if it is not already installed in your environment!
# We just use this to display nice HTML tables with pagination,
# so it's optional and you don't need to worry if it
# fails to install for whatever reason.
#!pip install itables

from itables import show
tract_counts = ipums_df['STUSAB'].value_counts().to_frame().reset_index()
show(tract_counts)

STUSAB	count
Loading ITables v2.0.1 from the internet... (need help?)

But now we can restrict our analysis to just DC, Maryland, and Virginia:

states_to_include = [
    'District of Columbia',
    'Maryland',
    'Virginia'
]
dmv_df = ipums_df[ipums_df['STATE'].isin(states_to_include)].copy()

And we can look at the 153 unique values that are listed in the “County” field for these states, where you’ll see that this corresponds not only to “counties” in the standard colloquial sense but also to areas that have not been incorporated into any counties: places like Alexandria city:

county_counts = dmv_df['COUNTY'].value_counts(dropna=False)
show(county_counts)

	count
COUNTY
Loading ITables v2.0.1 from the internet... (need help?)

Going through these unique values, I select the areas that seemed to be included in Pew’s “DC Metro Area” map:

counties = [
    'Fairfax County', # 274 tracts
    'Montgomery County', # 255 tracts
    "Prince George's County", # 214 tracts
    'District of Columbia', # 206 tracts
    'Arlington County', # 71 tracts
    'Alexandria city', # 48 tracts
    'Fairfax city', # 5 tracts
    'Falls Church city', # 3 tracts
]
dmv_df = dmv_df[dmv_df['COUNTY'].isin(counties)].copy()

And now, since we’re about to merge this data with the shapefiles for Maryland, DC, and Virginia, which have a GEOID field of type string, we’ll need to create a string version of the TL_GEO_ID variable from IPUMS, for merging:

# String version for merging
dmv_df['TL_GEO_ID_str'] = dmv_df['TL_GEO_ID'].apply(str)

TIGER Shapefiles for DC, Maryland, and Virginia

Next we’ll load the TIGER shapefiles provided by the Census website, for DC (FIPS code 11), Maryland (FIPS code 24), and Virginia (FIPS code 51).

Here we use the amazing GeoPandas library, which essentially lets us keep using Pandas as we’ve been using it, but also maintains a GIS representation of the data “under the hood”, so that when we’re ready to plot our data we can plug the GeoDataFrame object into (for example) Plotly or any other data visualization library that supports mapping!

# Uncomment the following to install geopandas, if it is
# not already installed in your environment!
#!pip install geopandas

import geopandas as gpd
# Shapefiles
dc_shape_df = gpd.read_file("assets/tl_2021_11_tract/tl_2021_11_tract.shp")
md_shape_df = gpd.read_file("assets/tl_2021_24_tract/tl_2021_24_tract.shp")
va_shape_df = gpd.read_file("assets/tl_2021_51_tract/tl_2021_51_tract.shp")
dmv_shape_df = pd.concat([dc_shape_df,md_shape_df,va_shape_df], ignore_index=True)

Now, since our original dmv_df and the GeoPandas-managed dmv_shape_df both have GEO_ID variables (with slightly different names), we can merge them into a single DataFrame and then tell GeoPandas to track all of this information!

geo_df_pd = pd.merge(dmv_df, dmv_shape_df, left_on='TL_GEO_ID_str', right_on='GEOID', how='left')
geo_df = gpd.GeoDataFrame(geo_df_pd)
geo_df.set_index('GEOID', inplace=True)

Visualizing with Plotly

And now, finally, we can make use of Plotly’s Cloropethmapbox object to create a Cloropeth map of the different income levels:

# Uncomment the following to install Plotly, if it is not already
# installed on your machine!
#!pip install plotly

import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook"
median_income_var = "AOQIE001"
# Capitol Building
#capitol_lat = 38.889805
#capitol_lon = -77.009056
# White House
#center_lat = 38.8977
#center_lon = -77.0365
# Scott Statue
center_lat = 38.907278946266466
center_lon = -77.03651807332851

income_fig = px.choropleth_mapbox(
    geo_df,
    geojson=geo_df.geometry,
    locations=geo_df.index,
    #z=geo_df[median_income_var],
    color=median_income_var,
    #autocolorscale=True,
    opacity=0.7,
    mapbox_style='carto-positron',
    zoom = 10.4,
    center = {
        "lat": center_lat,
        "lon": center_lon,
    },
    # width=800,
    # height=800
)
income_fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
income_fig.show()

Notice anything? …I feel like the raw median income distribution pretty much tells the whole story, but if we want to fully recreate the Pew maps, we could collapse these income levels down into (low, medium, high) using the methodology from the report’s appendix to produce a map of categorical income levels. For 2021, the most recent year for which IPUMS had the 5-year ACS data, the median income for the DC metro area was $110,355 (for comparison, the national median household income was $70,784), so that (letting $M$ represent this metro-area median)

The cutoff for low-income (using Pew’s methodology) is $\frac{2}{3}\cdot M$ = $73,570
The cutoff for high-income (again using Pew’s methodology) is $2M$ = $220,710

median_income_var = "AOQIE001"
# Capitol Building
#capitol_lat = 38.889805
#capitol_lon = -77.009056
# White House
center_lat = 38.8977
center_lon = -77.0365

# Here we'll drop NA, since Plotly doesn't handle
# NA values as well as NaN values
geo_df_nona = geo_df[~pd.isna(geo_df[median_income_var])].copy()
# Cutpoints
#natl_median = 70000
metro_median = 110355
low_cutoff = (2/3) * metro_median
high_cutoff = 2 * metro_median
def get_income_level(income):
    # If NA, we want to keep its category as NA
    if pd.isna(income):
        return pd.NA
    if income < low_cutoff:
        return "Low"
    if income > high_cutoff:
        return "High"
    return "Medium"
geo_df_nona['income_level'] = geo_df_nona[median_income_var].apply(get_income_level)
level_fig = px.choropleth_mapbox(geo_df_nona,
  geojson=geo_df_nona.geometry,
  color="income_level",
  locations=geo_df_nona.index,
  #featureidkey="properties.district",
  center={"lat": center_lat, "lon": center_lon},
  mapbox_style="carto-positron",
  hover_data=[median_income_var],
  zoom=10,
  color_discrete_map={
    'High': 'green',
    'Medium': 'lightgrey',
    'Low': 'red'
  }
)
level_fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
level_fig.show()

And voila! The pattern looks… even more bleak in 2021 than it did in 2012 😔