Syllabus

Welcome to DSAN 5650: Causal Inference for Computational Social Science at Georgetown University!

The course meets on Wednesdays from 6:30-9pm online via Zoom

Course Staff

Course Description

This course provides students with the opportunity to take the analytical skills, machine learning algorithms, and statistical methods learned throughout their first year in the program and explore how they can be employed towards carrying out rigorous, original research in the behavioral and social sciences. With a particular emphasis on tackling the additional challenges which arise when moving from associational to causal inference, particularly when only observational (as opposed to experimental) data is available, students will become proficient in cutting-edge causal Machine Learning techniques such as propensity score matching, synthetic controls, causal program evaluation, inverse social welfare function estimation from panel data, and Double-Debiased Machine Learning.

In-class examples will cover continuous, discrete-choice, and textual data from a wide swath of social and behavioral sciences: economics, political science, sociology, anthropology, quantitative history, and digital humanities. After gaining experience through in-class labs and homework assignments focused on reproducing key findings from recent journal articles in each of these disciplines, students will spend the final weeks of the course on a final project demonstrating their ability to develop, evaluate, and test the robustness of a causal hypothesis.

Prerequisites: DSAN 5000, DSAN 5100 (DSAN 5300 recommended but not required)

Course Overview

The course revolves around three “pillars”, which we’ll examine individually before bringing them together for your final projects at the end of the class: high-level ethical issues in data science, general ethical frameworks, and public policy applications.

Data Science

A portion of the course will focus on introductions to cutting-edge technologies like self-driving cars, ChatGPT, facial detection algorithms, and various applications of AI to police and military technologies. For this portion, we’ll draw fairly often from the contents of the following books:

Since there are plenty of in-depth resources available to you (e.g., other Georgetown courses!) for learning the technical details of these technologies, our goal in this course will be to learn just the particular aspects of each technology which are most relevant to the ethical and policy issues they present.

For example, we will look at Neural Netwok-based Machine Learning algorithms, but we will focus specifically on how the performance of these algorithms on a given task depends crucially on the existence of effective training data for that task. The breakthroughs in Artificial Intelligence which have had an immense impact on society over the past few decades, for example, have not come about because of new algorithms (neural networks, for example, have been around since the 1950s). Rather, they have come about because of the massive, exponential increase in the amount of data available to train these already-existing algorithms: for example, data scraped from across the entire web, or from millions of scanned books, or from Wikipedia’s massive collection of articles. This means, therefore, that these algorithms simply encode pre-existing human biases into algorithmically-derived “rules”, thus motivating the next pillar of the course: Ethics!

Ethics

For the ethics-focused portion of the course, we’ll be reading selections from the following textbook:

From the vast array of readings contained in this collection, we’ll look at both “standard” ethical readings from e.g. Jeremy Bentham and Immanuel Kant plus readings from literary sources like Ursula Le Guin and Ambrose Bierce.

Public Policy

For the final piece of the course we will take the technological developments discussed the first portion, analyze them using the ethical frameworks discussed in the second portion, and come to conclusions as to what types of things lawmakers, governments, and civil society organizations (NGOs, for example, and Think Tanks) can do in practice to address the ethical issues raised by these technologies. This means that, specifically, the recommended final project for the course will be a Policy Whitepaper, where you will choose a particular institution and make a recommendation to them in terms of how they can use their power (for example, the power to pass laws) to most effectively address an ethical issue that you believe is important.

For this portion of the class we’ll have to draw on a wide range of different readings, depending on what particular subdomains of public policy are most interesting to you all, but as a general textbook on ethics in data science which does focus a good amount on policy specifically, we will look at:

Now that you have an overview of the trajectory of the course, the following section contains the particulars of what we’ll be reading and working on each week!

Schedule

The following is a rough map of what we will work through together throughout the semester; given that everyone learns at a different pace, my aim is to leave us with a good amount of flexibility in terms of how much time we spend on each topic: if I find that it takes me longer than a week to convey a certain topic in sufficient depth, for example, then I view it as a strength rather than a weakness of the course that we can then rearrange the calendar below by adding an extra week on that particular topic! Similarly, if it seems like I am spending too much time on a topic, to the point that students seem bored or impatient to move onto the next topic, we can move a topic intended for the next week to the current week!

Unit Week Date Topic
Unit 1: Ethical Frameworks 1 Jan 15 Introduction to the Course
2 Jan 22 Machine Learning, Training Data, and Bias
Unit 2: Fairness in AI 3 Jan 29 Ethical Frameworks: Rights, Discrimination, and Fairness
Jan 31 (Friday), 5:59pm EST [Deliverable] HW1: Nuts and Bolts for Fairness in AI
4 Feb 5 (Descriptive) Fairness in AI
5 Feb 12 Context-Sensitive Fairness
6 Feb 19 Causality in Ethics and Policy
Feb 21 (Friday), 5:59pm EST [Deliverable] HW2: Context-Sensitive Fairness
Midterm 7 Feb 26 In-Class Midterm: Data Ethics, Fairness, Privacy, Causality
Mar 6 No Class (Spring Break)
Unit 3: Policy Frameworks 8 Mar 12 Privacy Policies, Incomplete Contracts, and Power
9 Mar 19 From Data Ethics to Data Policy
10 Mar 26 Econometric Policy Evaluation and Inverse Fairness
11 Apr 2 Fairness vs. Social Welfare
Unit 4: Applications 12 Apr 9 Project Talk, Causality and Identity Formation
13 Apr 16 Applications: Race, Class, Gender, Sexuality, and Disability (Data Feminism)
14 Apr 23 Republican Liberty and the Kindly Slavemaster
May 10 (Friday) [Deliverable] Policy Whitepaper

Assignments and Grading

The main assignment in the course will be your policy whitepaper, submitted at the end of the semester. However, there will also be a midterm exam and a series of assignments which exist to let you explore each of the modules of the course, in turn.

Assignment Due Date % of Grade
HW1: Nuts and Bolts for Fairness in AI

Friday, February 9

10%
HW2: Context-Sensitive Fairness

Friday, February 21

10%
Midterm Wednesday, February 28 30%
HW3: Privacy Policies as Incomplete Contracts

Friday, April 12

10%
HW4: Policy Evaluation

Friday, April 26

10%
Policy Whitepaper Friday, May 10 30%

Homework Lateness Policy

After the due date, for each homework assignment, you will have a grace period of 24 hours to submit the assignment without a lateness penalty. After this 24-hour grace period, late penalties will be applied based on the following scale (unless you obtain an excused lateness from one of the instructional staff!):

  • 0 to 24 hours late: no penalty
  • 24 to 30 hours late: 2.5% penalty
  • 30 to 42 hours late: 5% penalty
  • 42 to 54 hours late: 10% penalty
  • 54 to 66 hours late: 20% penalty
  • More than 66 hours late: Assignment submissions no longer accepted (without instructor approval)