jpj251@nyu.edu
¶Where we left off:
Fundamental Problem of Causal Inference: Forget Everything And Run?
from Sampson, Winship, and Knight (2013), "Translating Causal Claims: Principles and Strategies for Policy‐Relevant Criminology"
from Bradford (2020), "Observations on Police Shootings & Interracial Violence"
Where we left off:
my_string = "Jeff"
my_float = 5.5
my_string * my_float
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-113-e6906dc4821b> in <module> 1 my_string = "Jeff" 2 my_float = 5.5 ----> 3 my_string * my_float TypeError: can't multiply sequence by non-int of type 'float'
my_basic_list = [1, "one", 1.0]
print(my_basic_list)
[1, 'one', 1.0]
my_meta_list = [my_basic_list, 2, "two", 2.0, [3, "three", 3.0]]
print(my_meta_list)
[[1, 'one', 1.0], 2, 'two', 2.0, [3, 'three', 3.0]]
my_non_meta_list = my_basic_list + [2, "two", 2.0] + [3, "three", 3.0]
print(my_non_meta_list)
[1, 'one', 1.0, 2, 'two', 2.0, 3, 'three', 3.0]
set
s, which contain elements but have no notion of a "first", "second", "third" elementordered_list = [4, 3, 2, 1]
set(ordered_list)
{1, 2, 3, 4}
ordered_array = np.array(ordered_list)
import numpy as np
ordered_array = np.array(ordered_list)
print(ordered_array)
print(ordered_list)
[4 3 2 1] [4, 3, 2, 1]
ordered_list.mean()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-120-46a02c4663bc> in <module> ----> 1 ordered_list.mean() AttributeError: 'list' object has no attribute 'mean'
ordered_array.mean()
2.5
Important difference, though: NumPy arrays require that all elements be the same type!
...so what happens if they're different?
no_bueno = np.array([1,"two",3.0])
no_bueno
array(['1', 'two', '3.0'], dtype='<U11')
print(type(no_bueno[0]))
print(type(no_bueno[1]))
print(type(no_bueno[2]))
<class 'numpy.str_'> <class 'numpy.str_'> <class 'numpy.str_'>
[
]
(square brackets): list
{
}
(curly brackets): set
(or dict
)(
)
(parentheses): tuple
type([1, 2, 3])
list
type({1, 2, 3})
set
type({1: "one", 2: "two", 3: "three"})
dict
type((1, 2, 3))
tuple
(From most basic to most fancy)
1. NumPy: import numpy as np
$\rightarrow$ Math with arrays
2. Pandas: import pandas as pd
$\rightarrow$ Math with Tables (DataFrames)
3. Matplotlib: import matplotlib.pyplot as plt
$\rightarrow$ Visualizing NumPy/Pandas objects
4. Statsmodels: import statsmodels.formula.api as smf
$\rightarrow$ Statistical hypothesis testing
5. Seaborn: import seaborn as sns
$\rightarrow$ Visualizing statistical hypothesis tests
6. Scikit-learn: import sklearn
$\rightarrow$ Fancy machine learning things
import numpy as np
cool_array = np.array([1, 2, 3, 4, 5])
cool_array.std()
1.4142135623730951
import pandas as pd
cool_df = pd.DataFrame(
{'x':[1,2,3,4,5,6,7,8],
'y':[2,4,5,5,7,9,10,13],
'z':[0,0,1,0,1,1,1,1]}
)
cool_df.head()
x | y | z | |
---|---|---|---|
0 | 1 | 2 | 0 |
1 | 2 | 4 | 0 |
2 | 3 | 5 | 1 |
3 | 4 | 5 | 0 |
4 | 5 | 7 | 1 |
import matplotlib.pyplot as plt
plt.scatter(cool_df['x'], cool_df['y'])
plt.show()
import statsmodels.formula.api as smf
result = smf.ols('y ~ x', data=cool_df).fit()
print(result.summary().tables[1])
============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 0.3929 0.614 0.640 0.546 -1.110 1.895 x 1.4405 0.122 11.846 0.000 1.143 1.738 ==============================================================================
import seaborn as sns
sns.regplot(x='x', y='y', data=cool_df);
import sklearn
import scikitplot as skplt
skplt.metrics.plot_confusion_matrix(classes, predictions, normalize=True)
plt.show()