Syllabus

Welcome to DSAN 5450: Data Ethics and Policy at Georgetown University!

The course meets on Wednesdays from 3:30-6pm in the Walsh Building, Room 498

Course Staff

Prof. Jeff Jacobs, jj1088@georgetown.edu
- Office hours (Click to schedule): Tuesdays, 3:30-6:30pm
TA Marie Vaughan, mev71@georgetown.edu
TA Sam Sofman, sbs106@georgetown.edu

Course Description

This graduate-level course will train students to navigate the landscape of ethical issues which arise at each step of the data science process, with an eye towards developing policy recommendations for governments and organizations seeking expert advice on how to tackle these issues from a regulatory perspective. Students will explore and critically evaluate a range of data-related issues in contemporary society, such as responsible data collection, algorithmic bias, privacy, transparency, accountability, democratic participation in data usage and data-driven decisions, and the ethical implications of emerging technologies like artificial intelligence and machine learning (self-driving cars, ChatGPT, crowd-sourced training data, etc.).

Beginning with a set of historical case studies—instances in which scientists, engineers, and policymakers have been forced to re-evaluate their ethical intuitions in light of technological developments (nuclear power, use of social media platforms to organize protests and influence political outcomes, deployment of facial recognition software and predictive AI by police and military forces)—the course then introduces a set of general ethical frameworks (consequentialism, deontological ethics, and virtue ethics), challenging students to consider their relative strengths and weaknesses for addressing modern technological-ethical dilemmas faced by businesses, healthcare organizations, governments, and academic institutions. After a final portion of the course linking these ethical frameworks with practical regulatory and policy considerations, students will write and present a policy whitepaper analyzing a data-ethical issue of particular interest to them, integrating ethical perspectives, regulatory principles, and domain knowledge into a recommendation of best practices for the relevant agency, firm, or institution.

The course will thus equip students with a robust ethical “toolbox” for conscientiously gathering, interpreting, and extracting meaning from data throughout their careers as data scientists, while respecting privacy, fairness, transparency, democratic accountability, and other social concerns. Prerequisites: None. 3 credits.

Course Overview

The course revolves around three “pillars”, which we’ll examine individually before bringing them together for your final projects at the end of the class: high-level ethical issues in data science, general ethical frameworks, and public policy applications.

Data Science

A portion of the course will focus on introductions to cutting-edge technologies like self-driving cars, ChatGPT, facial detection algorithms, and various applications of AI to police and military technologies. For this portion, we’ll draw fairly often from the contents of the following books:

Perez (2019): Invisible Women: Data Bias in a World Designed for Men.
Catherine D’Ignazio and Lauren F. Klein (2020). Data Feminism. Cambridge, MA: MIT Press. [Free, open-source!]
Cathy O’Neil (2016). Weapons of Math Destruction. New York, NY: Crown Books.

Since there are plenty of in-depth resources available to you (e.g., other Georgetown courses!) for learning the technical details of these technologies, our goal in this course will be to learn just the particular aspects of each technology which are most relevant to the ethical and policy issues they present.

For example, we will look at Neural Netwok-based Machine Learning algorithms, but we will focus specifically on how the performance of these algorithms on a given task depends crucially on the existence of effective training data for that task. The breakthroughs in Artificial Intelligence which have had an immense impact on society over the past few decades, for example, have not come about because of new algorithms (neural networks, for example, have been around since the 1950s). Rather, they have come about because of the massive, exponential increase in the amount of data available to train these already-existing algorithms: for example, data scraped from across the entire web, or from millions of scanned books, or from Wikipedia’s massive collection of articles. This means, therefore, that these algorithms simply encode pre-existing human biases into algorithmically-derived “rules”, thus motivating the next pillar of the course: Ethics!

Ethics

For the ethics-focused portion of the course, we’ll be reading selections from the following textbook:

Lewis Vaughn and Louis P. Pojman (2021). The Moral Life: An Introductory Reader in Ethics and Literature. Oxford, UK: Oxford University Press. [PDF]

From the vast array of readings contained in this collection, we’ll look at both “standard” ethical readings from e.g. Jeremy Bentham and Immanuel Kant plus readings from literary sources like Ursula Le Guin and Ambrose Bierce.

Public Policy

For the final piece of the course we will take the technological developments discussed the first portion, analyze them using the ethical frameworks discussed in the second portion, and come to conclusions as to what types of things lawmakers, governments, and civil society organizations (NGOs, for example, and Think Tanks) can do in practice to address the ethical issues raised by these technologies. This means that, specifically, the recommended final project for the course will be a Policy Whitepaper, where you will choose a particular institution and make a recommendation to them in terms of how they can use their power (for example, the power to pass laws) to most effectively address an ethical issue that you believe is important.

For this portion of the class we’ll have to draw on a wide range of different readings, depending on what particular subdomains of public policy are most interesting to you all, but as a general textbook on ethics in data science which does focus a good amount on policy specifically, we will look at:

Anne L. Washington (2023). Ethical Data Science: Prediction in the Public Interest. New York, NY: Oxford University Press.

Now that you have an overview of the trajectory of the course, the following section contains the particulars of what we’ll be reading and working on each week!

Schedule

The following is a rough map of what we will work through together throughout the semester; given that everyone learns at a different pace, my aim is to leave us with a good amount of flexibility in terms of how much time we spend on each topic: if I find that it takes me longer than a week to convey a certain topic in sufficient depth, for example, then I view it as a strength rather than a weakness of the course that we can then rearrange the calendar below by adding an extra week on that particular topic! Similarly, if it seems like I am spending too much time on a topic, to the point that students seem bored or impatient to move onto the next topic, we can move a topic intended for the next week to the current week!

Unit	Week	Date	Topic
Unit 1: Ethical Frameworks	1	Jan 15	Introduction to the Course
	2	Jan 22	Machine Learning, Training Data, and Bias
Unit 2: Fairness in AI	3	Jan 29	Ethical Frameworks: Rights, Discrimination, and Fairness
		Jan 31 (Friday), 5:59pm EST	[Deliverable] HW1: Nuts and Bolts for Fairness in AI
	4	Feb 5	(Descriptive) Fairness in AI
	5	Feb 12	Context-Sensitive Fairness
	6	Feb 19	Causality in Ethics and Policy
		Feb 21 (Friday), 5:59pm EST	[Deliverable] HW2: Context-Sensitive Fairness
Midterm	7	Feb 26	In-Class Midterm: Data Ethics, Fairness, Privacy, Causality
		Mar 6	No Class (Spring Break)
Unit 3: Policy Frameworks	8	Mar 12	Privacy Policies, Incomplete Contracts, and Power
	9	Mar 19	From Data Ethics to Data Policy
	10	Mar 26	Welfare Economics and Policy Evaluation
	11	Apr 2	Fear and Loathing on the Pareto Frontier
Unit 4: Applications	12	Apr 9	Projects, Causality and Identity Formation
	13	Apr 16	Applications: Race, Class, Gender, Sexuality, and Disability (Data Feminism)
	14	Apr 23	Final Project In-Class Office Hours
		May 10 (Friday)	[Deliverable] Policy Whitepaper

Assignments and Grading

The main assignment in the course will be your policy whitepaper, submitted at the end of the semester. However, there will also be a midterm exam and a series of assignments which exist to let you explore each of the modules of the course, in turn.

Assignment	Due Date	% of Grade
HW1: Nuts and Bolts for Fairness in AI	Friday, February 9	10%
HW2: Context-Sensitive Fairness	Friday, February 21	10%
Midterm	Wednesday, February 28	30%
HW3: Privacy Policies as Incomplete Contracts	Friday, April 12	10%
HW4: Policy Evaluation	Friday, April 26	10%
Policy Whitepaper	Friday, May 10	30%

Homework Lateness Policy

After the due date, for each homework assignment, you will have a grace period of 24 hours to submit the assignment without a lateness penalty. After this 24-hour grace period, late penalties will be applied based on the following scale (unless you obtain an excused lateness from one of the instructional staff!):

0 to 24 hours late: no penalty
24 to 30 hours late: 2.5% penalty
30 to 42 hours late: 5% penalty
42 to 54 hours late: 10% penalty
54 to 66 hours late: 20% penalty
More than 66 hours late: Assignment submissions no longer accepted (without instructor approval)

References

Perez, Caroline Criado. 2019. Invisible Women: Data Bias in a World Designed for Men. Abrams.