source("../dsan-globals/_globals.r")
Week 6: Multivariate Distributions
DSAN 5100: Probabilistic Modeling and Statistical Computing
Section 01
Thinking Through Specific Multivariate Distributions
- From \(2\) to \(N\) variables!
The Multinoulli Distribution
May seem like a weird/contrived distribution, but is perfect for building intuition, as your first \(N\)-dimensional distribution (\(N > 2\))
\(\mathbf{X}\) is a six-dimensional Vector-Valued RV, so that
\[ \mathbf{X} = (X_1, X_2, X_3, X_4, X_5, X_6), \]
where \(\mathcal{R}_{X_1} = \{0, 1\}, \mathcal{R}_{X_2} = \{0, 1\}, \ldots, \mathcal{R}_{X_6} = \{0, 1\}\)
But, \(X_1, X_2, \ldots, X_6\) are not independent! In fact, they are so dependent that if one has value \(1\), the rest must have value \(0\) \(\leadsto\) we can infer the support of \(\mathbf{X}\):
\[ \begin{align*} \mathcal{R}_{\mathbf{X}} = \{ &(1,0,0,0,0,0),(0,1,0,0,0,0),(0,0,1,0,0,0), \\ &(0,0,0,1,0,0),(0,0,0,0,1,0),(0,0,0,0,0,1)\} \end{align*} \]
Lastly, need to define the probability that \(\mathbf{X}\) takes on any of these values. Let’s say \(\Pr(\mathbf{X} = \mathbf{v}) = \frac{1}{6}\) for all \(\mathbf{v} \in \mathcal{R}_{\mathbf{X}}\). Do we see the structure behind this contrived case?
(For math major friends, there is an isomorphism afoot… For the rest, it’s an extremely inefficient way to model outcomes from rolling a fair die)
The Multivariate Normal Distribution
- We’ve already seen the matrix notation for writing the parameters of this distribution: \(\mathbf{X}_{[k \times 1]} \sim \mathcal{N}_k\left(\boldsymbol\mu_{[k \times 1]}, \Sigma_{[k \times k]}\right)\)
- Now we get to crack open the matrix notation for writing its pdf:
\[ f_\mathbf{X}(\mathbf{x}_{[k \times 1]}) = \underbrace{\left(\frac{1}{\sqrt{2\pi}}\right)^k \frac{1}{\sqrt{\det(\Sigma)}}}_{\text{Normalizing constants}} \exp\left(-\frac{1}{2}\underbrace{(\mathbf{x} - \boldsymbol\mu)^\top \Sigma^{-1} (\mathbf{x} - \boldsymbol\mu)}_{\text{Quadratic form}}\right) \]
- Try to squint your eyes while looking at the above and compare with…
pdf of \(\mathcal{N}(\mu,\sigma)\)
\[ f_X(v) = \frac{1}{\sigma\sqrt{2\pi}}\bigexp{-\frac{1}{2}\left(\frac{v - \mu}{\sigma}\right)^2} \]
Structure of \(\Sigma\)
\[ \begin{align*} \mathbf{\Sigma} &= \begin{bmatrix}\sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_2\sigma_1 & \sigma_2^2\end{bmatrix} \\[0.1em] \implies \det(\Sigma) &= \sigma_1^2\sigma_2^2 - \rho^2\sigma_1^2\sigma_2^2 \\ &= \sigma_1^2\sigma_2^2(1-\rho^2) \end{align*} \]
Quadratic Forms
- Quadratic forms will seem scary until someone forces you to write out the matrix multiplication!
- Start with the 1D case: \(\mathbf{x} = [x_1]\), \(\boldsymbol\mu = [\mu_1]\), \(\Sigma = [\sigma^2]\). Then
\[ (\mathbf{x} - \boldsymbol\mu)^\top \Sigma^{-1} (\mathbf{x - \boldsymbol\mu}) = (x_1 - \mu_1)\frac{1}{\sigma^2}(x_1 - \mu_1) = \left(\frac{x_1 - \mu_1}{\sigma}\right)^2 ~ 🤯 \]
The 2D Case
- Let \(\mathbf{x} = \left[\begin{smallmatrix}x_1 \\ x_2\end{smallmatrix}\right]\), \(\boldsymbol\mu = \left[ \begin{smallmatrix}\mu_1 \\ \mu_2 \end{smallmatrix}\right]\), \(\Sigma\) as in previous slide. Then \(\mathbf{x} - \boldsymbol\mu = \left[ \begin{smallmatrix} x_1 - \mu_1 \\ x_2 - \mu_2 \end{smallmatrix} \right]\)
- Using what we know about \(2 \times 2\) matrix inversion,
\[ \Sigma^{-1} = \frac{1}{\det(\Sigma)}\left[ \begin{smallmatrix} \sigma_2^2 & -\rho \sigma_2\sigma_1 \\ -\rho \sigma_1\sigma_2 & \sigma_1^2\end{smallmatrix} \right] = \frac{1}{\sigma_1^2\sigma_2^2(1-\rho^2)}\left[ \begin{smallmatrix} \sigma_2^2 & -\rho \sigma_2\sigma_1 \\ -\rho \sigma_1\sigma_2 & \sigma_1^2\end{smallmatrix} \right] \]
- With \(\widetilde{\rho} \equiv 1 - \rho^2\), we can write everything as a bunch of matrix multiplications:
\[ \begin{align*} &(\mathbf{x} - \boldsymbol\mu)^\top \Sigma^{-1} (\mathbf{x} - \boldsymbol\mu) = \frac{1}{\sigma_1^2\sigma_2^2 \widetilde{\rho}} \begin{bmatrix}x_1 - \mu_1 & x_2 - \mu_2\end{bmatrix} \cdot \begin{bmatrix} \sigma_2^2 & -\rho \sigma_2\sigma_1 \\ -\rho \sigma_1\sigma_2 & \sigma_1^2\end{bmatrix} \cdot \begin{bmatrix}x_1 - \mu_1 \\ x_2 - \mu_2\end{bmatrix} \\ &= \frac{1}{\sigma_1^2\sigma_2^2 \widetilde{\rho}} \begin{bmatrix}(x_1-\mu_1)\sigma_2^2 - (x_2-\mu_2)\rho\sigma_1\sigma_2 & (x_2 - \mu_2)\sigma_1^2 - (x_1 - \mu_1)\rho\sigma_2\sigma_1 \end{bmatrix} \cdot \begin{bmatrix}x_1 - \mu_1 \\ x_2 - \mu_2\end{bmatrix} \\ &= \frac{1}{\sigma_1^2\sigma_2^2 \widetilde{\rho}} \left( (x_1-\mu_1)^2 \sigma_2^2 - (x_1 - \mu_1)(x_1 - \mu_2)\sigma_1\sigma_2 + (x_2 - \mu_2)^2 \sigma_1^2 - (x_1 - \mu_1)(x_2 - \mu_2)\sigma_2\sigma_1 \right) \\ &= \boxed{ \frac{1}{\widetilde{\rho}} \left( \left(\frac{x_1 - \mu_1}{\sigma_1}\right)^2 + \left(\frac{x_2 - \mu_2}{\sigma_2}\right)^2 - 2\rho\frac{(x_1 - \mu_1)(x_2 - \mu_2)}{\sigma_1\sigma_2} \right) } \end{align*} \]
The 2D Case In Its FINAL FORM
\[ f_{\mathbf{X}}(\mathbf{x}) = C \bigexp{ -\frac{1}{2(1-\rho^2)} \left( \left(\frac{x_1 - \mu_1}{\sigma_1}\right)^2 + \left(\frac{x_2 - \mu_2}{\sigma_2}\right)^2 - 2\rho\frac{(x_1 - \mu_1)(x_2 - \mu_2)}{\sigma_1\sigma_2} \right) } \]
where
\[ C = \frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}. \]