`jpj251@nyu.edu`

¶- This slideshow: https://jjacobs.me/dsua111-sections/week-04
- All materials: https://github.com/jpowerj/dsua111-sections

- HW1 Feedback
- Causality Wrap-Up
- Python Sequences
- Python Libraries

*Where we left off:*

**Fundamental Problem of Causal Inference**: Forget Everything And Run?

- Find good
**comparison**cases:**Treatment Group**and**Control Group**

- "Statistical Matching"
- Don't worry about the details, but tldr is:
- Find the two
**most similar**people, put one in Treatment Group, the other in Control Group, and compare their outcomes - Bam. If we can measure and take into account all variables that may be related to our causal hypothesis, this is
**as close as we can possibly get**to "solving" FPCI - [Not on the midterm or final, but relevant in case you're despairing about FPCI]

- Random Assignment: Vietnam War/Second Indochina War Draft
- Key point: makes treatment and control groups similar, on average, without us having to do any work!
- (e.g., don't need to worry about "pairing up" similar treatment+control units via statistical matching)

- No more Selection Effects
- Omitted variables are in BOTH Treatment and Control groups

- Tldr:
**Why**did this person (unit) end up in the**treatment**group?**Why**did this other person (unit) end up in the**control**group?- Are there systematic differences?

- Vietnam/Indochina Draft: Why can't we just study
**[men who join the military]**versus**[men who don't]**, and take the difference as a causal estimate?

- We ideally want people
**assigned**to the treatment group to**take**the treatment, and people**assigned**to the control group to**take**the control. - "Compliance" is the degree to which this is actually true in your experiment
**High**compliance = most people actually took what they were assigned**Low**compliance = lots of people who were assigned to treatment actually took control, and vice-versa

- What problems might there be with compliance in the Draft example?

- In observational studies, researchers have no control over assignment to treatment/control 😨
- On the one hand... Forget Everything And Run, if you can.
- On the other hand... statisticians over the last ~4 centuries have developed fancy causal inference tools/techniques to help us Face Everything And Rise!

from Sampson, Winship, and Knight (2013), "Translating Causal Claims: Principles and Strategies for Policy‐Relevant Criminology"

from Bradford (2020), "Observations on Police Shootings & Interracial Violence"

Where we left off:

In [113]:

```
my_string = "Jeff"
my_float = 5.5
my_string * my_float
```

- Lists: Ordered sequences of... anything (including lists themselves 🤯)

In [114]:

```
my_basic_list = [1, "one", 1.0]
print(my_basic_list)
```

[1, 'one', 1.0]

In [115]:

```
my_meta_list = [my_basic_list, 2, "two", 2.0, [3, "three", 3.0]]
print(my_meta_list)
```

[[1, 'one', 1.0], 2, 'two', 2.0, [3, 'three', 3.0]]

- This is
**not**the same as

In [116]:

```
my_non_meta_list = my_basic_list + [2, "two", 2.0] + [3, "three", 3.0]
print(my_non_meta_list)
```

[1, 'one', 1.0, 2, 'two', 2.0, 3, 'three', 3.0]

- ^(!!!)

- We specify "ordered" only because there are also
`set`

s, which contain elements but have no notion of a "first", "second", "third" element

In [117]:

```
ordered_list = [4, 3, 2, 1]
set(ordered_list)
```

Out[117]:

{1, 2, 3, 4}

In [118]:

```
ordered_array = np.array(ordered_list)
```

- What happened?

In [119]:

```
import numpy as np
ordered_array = np.array(ordered_list)
print(ordered_array)
print(ordered_list)
```

[4 3 2 1] [4, 3, 2, 1]

In [120]:

```
ordered_list.mean()
```

In [121]:

```
ordered_array.mean()
```

Out[121]:

2.5

**Important difference**, though: NumPy arrays require that all elements be the **same type**!

...so what happens if they're different?

In [122]:

```
no_bueno = np.array([1,"two",3.0])
no_bueno
```

Out[122]:

array(['1', 'two', '3.0'], dtype='<U11')

In [123]:

```
print(type(no_bueno[0]))
print(type(no_bueno[1]))
print(type(no_bueno[2]))
```

<class 'numpy.str_'> <class 'numpy.str_'> <class 'numpy.str_'>

`[`

`]`

(**square**brackets):`list`

`{`

`}`

(**curly**brackets):`set`

(or`dict`

)`(`

`)`

(**parentheses**):`tuple`

In [124]:

```
type([1, 2, 3])
```

Out[124]:

list

In [125]:

```
type({1, 2, 3})
```

Out[125]:

set

In [126]:

```
type({1: "one", 2: "two", 3: "three"})
```

Out[126]:

dict

In [127]:

```
type((1, 2, 3))
```

Out[127]:

tuple

(From most basic to most fancy)

**1. NumPy**: `import numpy as np`

$\rightarrow$ Math with **arrays**

**2. Pandas**: `import pandas as pd`

$\rightarrow$ Math with **Tables** (DataFrames)

**3. Matplotlib**: `import matplotlib.pyplot as plt`

$\rightarrow$ Visualizing NumPy/Pandas objects

**4. Statsmodels**: `import statsmodels.formula.api as smf`

$\rightarrow$ Statistical hypothesis testing

**5. Seaborn**: `import seaborn as sns`

$\rightarrow$ Visualizing statistical hypothesis tests

**6. Scikit-learn**: `import sklearn`

$\rightarrow$ Fancy machine learning things

In [142]:

```
import numpy as np
cool_array = np.array([1, 2, 3, 4, 5])
cool_array.std()
```

Out[142]:

1.4142135623730951

In [143]:

```
import pandas as pd
cool_df = pd.DataFrame(
{'x':[1,2,3,4,5,6,7,8],
'y':[2,4,5,5,7,9,10,13],
'z':[0,0,1,0,1,1,1,1]}
)
cool_df.head()
```

Out[143]:

x | y | z | |
---|---|---|---|

0 | 1 | 2 | 0 |

1 | 2 | 4 | 0 |

2 | 3 | 5 | 1 |

3 | 4 | 5 | 0 |

4 | 5 | 7 | 1 |

In [144]:

```
import matplotlib.pyplot as plt
plt.scatter(cool_df['x'], cool_df['y'])
plt.show()
```

In [166]:

```
import statsmodels.formula.api as smf
result = smf.ols('y ~ x', data=cool_df).fit()
print(result.summary().tables[1])
```

In [146]:

```
import seaborn as sns
sns.regplot(x='x', y='y', data=cool_df);
```

In [147]:

```
import sklearn
```

In [181]:

```
import scikitplot as skplt
skplt.metrics.plot_confusion_matrix(classes, predictions, normalize=True)
plt.show()
```