Code
source("../dsan-globals/_globals.r")
set.seed(5300)
DSAN 5300: Statistical Learning
Spring 2025, Georgetown University
Today’s Planned Schedule:
Start | End | Topic | |
---|---|---|---|
Lecture | 6:30pm | 7:00pm | Single Layer Neural Networks → |
7:00pm | 7:20pm | Max-Margin Classifiers → | |
7:20pm | 8:00pm | Support Vector Classifiers → | |
Break! | 8:00pm | 8:10pm | |
8:10pm | 9:00pm | Fancier Neural Networks → |
source("../dsan-globals/_globals.r")
set.seed(5300)
\[ \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\bigexp}[1]{\exp\mkern-4mu\left[ #1 \right]} \newcommand{\bigexpect}[1]{\mathbb{E}\mkern-4mu \left[ #1 \right]} \newcommand{\definedas}{\overset{\small\text{def}}{=}} \newcommand{\definedalign}{\overset{\phantom{\text{defn}}}{=}} \newcommand{\eqeventual}{\overset{\text{eventually}}{=}} \newcommand{\Err}{\text{Err}} \newcommand{\expect}[1]{\mathbb{E}[#1]} \newcommand{\expectsq}[1]{\mathbb{E}^2[#1]} \newcommand{\fw}[1]{\texttt{#1}} \newcommand{\given}{\mid} \newcommand{\green}[1]{\color{green}{#1}} \newcommand{\heads}{\outcome{heads}} \newcommand{\iid}{\overset{\text{\small{iid}}}{\sim}} \newcommand{\lik}{\mathcal{L}} \newcommand{\loglik}{\ell} \DeclareMathOperator*{\maximize}{maximize} \DeclareMathOperator*{\minimize}{minimize} \newcommand{\mle}{\textsf{ML}} \newcommand{\nimplies}{\;\not\!\!\!\!\implies} \newcommand{\orange}[1]{\color{orange}{#1}} \newcommand{\outcome}[1]{\textsf{#1}} \newcommand{\param}[1]{{\color{purple} #1}} \newcommand{\pgsamplespace}{\{\green{1},\green{2},\green{3},\purp{4},\purp{5},\purp{6}\}} \newcommand{\prob}[1]{P\left( #1 \right)} \newcommand{\purp}[1]{\color{purple}{#1}} \newcommand{\sign}{\text{Sign}} \newcommand{\spacecap}{\; \cap \;} \newcommand{\spacewedge}{\; \wedge \;} \newcommand{\tails}{\outcome{tails}} \newcommand{\Var}[1]{\text{Var}[#1]} \newcommand{\bigVar}[1]{\text{Var}\mkern-4mu \left[ #1 \right]} \]
Last week: Examples of how NNs are capable of learning…
The types of features that let us learn fancy non-linear DGPs: \(Y = {\color{#e69f00} X_1 X_2 }\) ✅, \(Y = {\color{#56b4e9} X_1^2 + X_2^2 }\) ✅, \(Y = {\color{#009E73} X_1 \underset{\mathclap{\small \text{XOR}}}{\oplus} X_2}\) ✅
Multi-layer networks like CNNs for “pooling” low-level/fine-grained information into high-level/coarse-grained information
This week: How do we actually learn the weights/biases which enable these capabilities?
For each training observation \((\mathbf{x}_i, y_i)\)…
Predict \(\widehat{y}_i\) from \(\mathbf{x}_i\)
Evaluate loss \(\mathcal{L}(\widehat{y}_i, y_i)\): Cross-Entropy Loss
Update parameters (weights/biases): Backpropagation
Key for success of NNs: Non-linear but differentiable
Max entropy = max uncertainty
Less entropy = less uncertainty
Min entropy = no uncertainty
Max entropy = max uncertainty
Less entropy = less uncertainty
Min entropy = no uncertainty
Max entropy = max uncertainty
Less entropy = less uncertainty
Min entropy = no uncertainty
(Full NN playlist here)