DSAN 5100: Probabilistic Modeling and Statistical Computing
Section 01
Tuesday, September 30, 2025
\[ \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\bigexp}[1]{\exp\mkern-4mu\left[ #1 \right]} \newcommand{\bigexpect}[1]{\mathbb{E}\mkern-4mu \left[ #1 \right]} \newcommand{\convergesAS}{\overset{\text{a.s.}}{\longrightarrow}} \newcommand{\definedas}{\overset{\text{def}}{=}} \newcommand{\definedalign}{\overset{\phantom{\text{def}}}{=}} \newcommand{\eqeventual}{\overset{\mathclap{\text{\small{eventually}}}}{=}} \newcommand{\Err}{\text{Err}} \newcommand{\expect}[1]{\mathbb{E}[#1]} \newcommand{\expectsq}[1]{\mathbb{E}^2[#1]} \newcommand{\fw}[1]{\texttt{#1}} \newcommand{\given}{\mid} \newcommand{\green}[1]{\color{green}{#1}} \newcommand{\heads}{\outcome{heads}} \newcommand{\iid}{\overset{\text{\small{iid}}}{\sim}} \newcommand{\lik}{\mathcal{L}} \newcommand{\loglik}{\ell} \newcommand{\mle}{\textsf{ML}} \newcommand{\nimplies}{\;\not\!\!\!\!\implies} \newcommand{\orange}[1]{\color{orange}{#1}} \newcommand{\outcome}[1]{\textsf{#1}} \newcommand{\param}[1]{{\color{purple} #1}} \newcommand{\pgsamplespace}{\{\green{1},\green{2},\green{3},\purp{4},\purp{5},\purp{6}\}} \newcommand{\prob}[1]{P\left( #1 \right)} \newcommand{\purp}[1]{\color{purple}{#1}} \newcommand{\sign}{\text{Sign}} \newcommand{\spacecap}{\; \cap \;} \newcommand{\spacewedge}{\; \wedge \;} \newcommand{\tails}{\outcome{tails}} \newcommand{\Var}[1]{\text{Var}[#1]} \newcommand{\bigVar}[1]{\text{Var}\mkern-4mu \left[ #1 \right]} \]
\[ \Pr(A) = \underset{\mathclap{\small \text{Probability }\textbf{mass}}}{\boxed{\frac{|\{A\}|}{|\Omega|}}} = \frac{1}{|\{A,B,C,D\}|} = \frac{1}{4} \]
\[ \Pr(A) = \underset{\mathclap{\small \text{Probability }\textbf{density}}}{\boxed{\frac{\text{Area}(\{A\})}{\text{Area}(\Omega)}}} = \frac{\pi r^2}{s^2} = \frac{\pi \left(\frac{1}{4}\right)^2}{4} = \frac{\pi}{64} \]
Discrete World | Continuous World | |
---|---|---|
How do we get probabilities? | Sums \(\Pr(X = 1)\) \(+ \Pr(X = 2) = \frac{2}{6}\) |
Integrals \(\displaystyle\int_{x=20}^{x=21}f_X(x) = 0.4\) |
What makes a valid distribution? | \(\displaystyle\sum_{x \in \mathcal{R}_X}\Pr(X = x) = 1\) | \(\displaystyle\int_{x \in \mathcal{R}_X}f_X(x) = 1\) |
How do we model uncertainty about multiple variables? | Frequency table | Joint pdf |
\[ \frac{\text{Volume}(\{(X,Y) \mid (X,Y) \in C\})}{\text{Volume}(\text{Hershey Kiss})} = \Pr((X,Y) \in C) \]
Figure 3.11 from DeGroot and Schervish (2013)
Figure 3.20 from DeGroot and Schervish (2013)
\[ f_G(g) = \frac{1}{12 - 10} = \frac{1}{2} \]
\[ X \sim \mathcal{N}(\mu, \sigma) \leadsto [X \mid a < X < b] \sim \ddot{\mathcal{N}}(\mu, \sigma, a, b) \]
Adapted from simstudy
package documentation
\[ f_X(x) = \frac{1}{\sigma}\cdot \frac{\varphi\left(\frac{x - \mu}{\sigma}\right)}{\Phi\left(\frac{b-\mu}{\sigma}\right) - \Phi\left(\frac{a - \mu}{\sigma}\right)} \approx \frac{\Pr(X \approx x, a < X < b)}{\Pr(a < X < b)} \]
\[ \begin{align*} f_{G,H}(g, h) &= f_G(g) \cdot f_H(h) \\ &= \frac{1}{2} \cdot \left[ \frac{1}{0.1} \cdot \frac { \varphi\left(\frac{h - 0.5}{0.1}\right) } { \Phi\left(\frac{12-0.5}{0.1}\right) - \Phi\left(\frac{10 - 0.5}{0.1}\right) } \right] \end{align*} \]
\[ \begin{align*} f_G(g) &= \int_{0}^{1}f_{G,H}(g,h)\mathrm{d}h = \frac{1}{2}, \\ f_H(h) &= \int_{10}^{12}f_{G,H}(g, h)\mathrm{d}g = \frac{1}{0.1} \cdot \frac { \varphi\left(\frac{h - 0.5}{0.1}\right) } { \Phi\left(\frac{12 - 0.5}{0.1}\right) - \Phi\left(\frac{10 - 0.5}{0.1}\right) } \end{align*} \]
\[ f_{H \mid G}(h | g) = \frac{f_{G,H}(g, h)}{f_G(g)}. \]
May seem like a weird/contrived distribution, but is perfect for building intuition, as your first \(N\)-dimensional distribution (\(N > 2\))
\(\mathbf{X}\) is a six-dimensional Vector-Valued RV, so that
\[ \mathbf{X} = (X_1, X_2, X_3, X_4, X_5, X_6), \]
where \(\mathcal{R}_{X_1} = \{0, 1\}, \mathcal{R}_{X_2} = \{0, 1\}, \ldots, \mathcal{R}_{X_6} = \{0, 1\}\)
But, \(X_1, X_2, \ldots, X_6\) are not independent! In fact, they are so dependent that if one has value \(1\), the rest must have value \(0\) \(\leadsto\) we can infer the support of \(\mathbf{X}\):
\[ \begin{align*} \mathcal{R}_{\mathbf{X}} = \{ &(1,0,0,0,0,0),(0,1,0,0,0,0),(0,0,1,0,0,0), \\ &(0,0,0,1,0,0),(0,0,0,0,1,0),(0,0,0,0,0,1)\} \end{align*} \]
Lastly, need to define the probability that \(\mathbf{X}\) takes on any of these values. Let’s say \(\Pr(\mathbf{X} = \mathbf{v}) = \frac{1}{6}\) for all \(\mathbf{v} \in \mathcal{R}_{\mathbf{X}}\). Do we see the structure behind this contrived case?
(For math major friends, there is an isomorphism afoot… For the rest, it’s an extremely inefficient way to model outcomes from rolling a fair die)
\[ f_\mathbf{X}(\mathbf{x}_{[k \times 1]}) = \underbrace{\left(\frac{1}{\sqrt{2\pi}}\right)^k \frac{1}{\sqrt{\det(\Sigma)}}}_{\text{Normalizing constants}} \exp\left(-\frac{1}{2}\underbrace{(\mathbf{x} - \boldsymbol\mu)^\top \Sigma^{-1} (\mathbf{x} - \boldsymbol\mu)}_{\text{Quadratic form}}\right) \]
pdf of \(\mathcal{N}(\mu,\sigma)\)
\[ f_X(v) = \frac{1}{\sigma\sqrt{2\pi}}\bigexp{-\frac{1}{2}\left(\frac{v - \mu}{\sigma}\right)^2} \]
Structure of \(\Sigma\)
\[ \begin{align*} \mathbf{\Sigma} &= \begin{bmatrix}\sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_2\sigma_1 & \sigma_2^2\end{bmatrix} \\[0.1em] \implies \det(\Sigma) &= \sigma_1^2\sigma_2^2 - \rho^2\sigma_1^2\sigma_2^2 \\ &= \sigma_1^2\sigma_2^2(1-\rho^2) \end{align*} \]
\[ (\mathbf{x} - \boldsymbol\mu)^\top \Sigma^{-1} (\mathbf{x - \boldsymbol\mu}) = (x_1 - \mu_1)\frac{1}{\sigma^2}(x_1 - \mu_1) = \left(\frac{x_1 - \mu_1}{\sigma}\right)^2 ~ 🤯 \]
\[ \Sigma^{-1} = \frac{1}{\det(\Sigma)}\left[ \begin{smallmatrix} \sigma_2^2 & -\rho \sigma_2\sigma_1 \\ -\rho \sigma_1\sigma_2 & \sigma_1^2\end{smallmatrix} \right] = \frac{1}{\sigma_1^2\sigma_2^2(1-\rho^2)}\left[ \begin{smallmatrix} \sigma_2^2 & -\rho \sigma_2\sigma_1 \\ -\rho \sigma_1\sigma_2 & \sigma_1^2\end{smallmatrix} \right] \]
\[ \begin{align*} &(\mathbf{x} - \boldsymbol\mu)^\top \Sigma^{-1} (\mathbf{x} - \boldsymbol\mu) = \frac{1}{\sigma_1^2\sigma_2^2 \widetilde{\rho}} \begin{bmatrix}x_1 - \mu_1 & x_2 - \mu_2\end{bmatrix} \cdot \begin{bmatrix} \sigma_2^2 & -\rho \sigma_2\sigma_1 \\ -\rho \sigma_1\sigma_2 & \sigma_1^2\end{bmatrix} \cdot \begin{bmatrix}x_1 - \mu_1 \\ x_2 - \mu_2\end{bmatrix} \\ &= \frac{1}{\sigma_1^2\sigma_2^2 \widetilde{\rho}} \begin{bmatrix}(x_1-\mu_1)\sigma_2^2 - (x_2-\mu_2)\rho\sigma_1\sigma_2 & (x_2 - \mu_2)\sigma_1^2 - (x_1 - \mu_1)\rho\sigma_2\sigma_1 \end{bmatrix} \cdot \begin{bmatrix}x_1 - \mu_1 \\ x_2 - \mu_2\end{bmatrix} \\ &= \frac{1}{\sigma_1^2\sigma_2^2 \widetilde{\rho}} \left( (x_1-\mu_1)^2 \sigma_2^2 - (x_1 - \mu_1)(x_1 - \mu_2)\sigma_1\sigma_2 + (x_2 - \mu_2)^2 \sigma_1^2 - (x_1 - \mu_1)(x_2 - \mu_2)\sigma_2\sigma_1 \right) \\ &= \boxed{ \frac{1}{\widetilde{\rho}} \left( \left(\frac{x_1 - \mu_1}{\sigma_1}\right)^2 + \left(\frac{x_2 - \mu_2}{\sigma_2}\right)^2 - 2\rho\frac{(x_1 - \mu_1)(x_2 - \mu_2)}{\sigma_1\sigma_2} \right) } \end{align*} \]
\[ f_{\mathbf{X}}(\mathbf{x}) = C \bigexp{ -\frac{1}{2(1-\rho^2)} \left( \left(\frac{x_1 - \mu_1}{\sigma_1}\right)^2 + \left(\frac{x_2 - \mu_2}{\sigma_2}\right)^2 - 2\rho\frac{(x_1 - \mu_1)(x_2 - \mu_2)}{\sigma_1\sigma_2} \right) } \]
where
\[ C = \frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}. \]
DSAN 5100 W06: Continuous Multivariate Distributions