source("../dsan-globals/_globals.r")
Week 5B: Joint, Marginal, and Conditional Distributions
DSAN 5100: Probabilistic Modeling and Statistical Computing
Section 01
Frequency Tables \(\leftrightarrow\) Probabilities
\[ \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\bigexp}[1]{\exp\mkern-4mu\left[ #1 \right]} \newcommand{\bigexpect}[1]{\mathbb{E}\mkern-4mu \left[ #1 \right]} \newcommand{\convergesAS}{\overset{\text{a.s.}}{\longrightarrow}} \newcommand{\definedas}{\overset{\text{def}}{=}} \newcommand{\definedalign}{\overset{\phantom{\text{def}}}{=}} \newcommand{\eqeventual}{\overset{\mathclap{\text{\small{eventually}}}}{=}} \newcommand{\Err}{\text{Err}} \newcommand{\expect}[1]{\mathbb{E}[#1]} \newcommand{\expectsq}[1]{\mathbb{E}^2[#1]} \newcommand{\fw}[1]{\texttt{#1}} \newcommand{\given}{\mid} \newcommand{\green}[1]{\color{green}{#1}} \newcommand{\heads}{\outcome{heads}} \newcommand{\iid}{\overset{\text{\small{iid}}}{\sim}} \newcommand{\lik}{\mathcal{L}} \newcommand{\loglik}{\ell} \newcommand{\mle}{\textsf{ML}} \newcommand{\nimplies}{\;\not\!\!\!\!\implies} \newcommand{\orange}[1]{\color{orange}{#1}} \newcommand{\outcome}[1]{\textsf{#1}} \newcommand{\param}[1]{{\color{purple} #1}} \newcommand{\pgsamplespace}{\{\green{1},\green{2},\green{3},\purp{4},\purp{5},\purp{6}\}} \newcommand{\prob}[1]{P\left( #1 \right)} \newcommand{\purp}[1]{\color{purple}{#1}} \newcommand{\sign}{\text{Sign}} \newcommand{\spacecap}{\; \cap \;} \newcommand{\spacewedge}{\; \wedge \;} \newcommand{\tails}{\outcome{tails}} \newcommand{\Var}[1]{\text{Var}[#1]} \newcommand{\bigVar}[1]{\text{Var}\mkern-4mu \left[ #1 \right]} \]
Frequency Tables
- What does Table 1 tell us on its own (before computing proportions in our heads) that is useful for probability?
- Answer: Not much!
- But, once we find the overall total, it tells us a lot (everything we need to know!)
(\(G\) = grade, \(H\) = honors status)
Tells us, e.g., 5 honors students in grade 10
\(H = 0\) | \(H = 1\) | |
---|---|---|
\(G = 10\) | 10 | 5 |
\(G = 11\) | 6 | 4 |
\(G = 12\) | 7 | 1 |
A frequency table where each row corresponds to a grade in a certain senior high school, each column corresponds to honor-student-status (\(H=1\) represents honors, \(H=0\) represents non-honors), and each cell contains the number of students in that grade with that honors-status
Why Do We Need The Total?
- Q1: Someone asks the probability that a randomly-selected student will be an honor student in 11th grade.
- Q2: Someone asks what proportion of students are honors
- Q3: Someone asks what % of 12th grade are honors
Q1, for example, is asking us for \(\Pr(G = 11, H = 1)\), a question we can answer if we know the joint distribution \(f_{G,H}(g, h)\)
Back to the Naïve Definition
Using our naïve definition of probability, we can compute this probability using the frequencies in the table as
\[ \Pr(G = 11, H = 1) = \frac{\#[G = 11, H = 1]}{\#\text{ Students Total}} \]
Plugging in the values from Table 1, we obtain the answer:
\[ \Pr(G = 11, H = 1) = \frac{4}{33} \approx 0.121 \]
Frequency Table → Probability Table
- When we divide by 33, we are normalizing the counts, producing probabilities (normalized counts)
- By normalizing all cells in the table, we convert our frequency table into a probability table
Computing Overall Total by Column
We could compute the total by summing columns, then summing over our individual column totals to get 33:
\(H = 0\) | \(H = 1\) | Total | |
---|---|---|---|
\(G = 10\) | 10 | 5 | |
\(G = 11\) | 6 | 4 | |
\(G = 12\) | 7 | 1 | |
Total | 23 | 10 | 33 |
Computing Overall Total by Row
Or, we could compute the total by summing rows, then summing over our individual row totals to get 33:
\(H = 0\) | \(H = 1\) | Total | ||
---|---|---|---|---|
\(G = 10\) | 10 | 5 | 15 | |
\(G = 11\) | 6 | 4 | 10 | |
\(G = 12\) | 7 | 1 | 8 | |
Total | 33 |
Bringing Both Methods Together
\(H = 0\) | \(H = 1\) | Total | ||
---|---|---|---|---|
\(G = 10\) | 10 | 5 | 15 | |
\(G = 11\) | 6 | 4 | 10 | |
\(G = 12\) | 7 | 1 | 8 | |
Total | 23 | 10 | 33 |
Frequencies to Probabilities
Now let’s use overall total (33) to convert counts into probabilities:
\(H = 0\) | \(H = 1\) | Total | |
---|---|---|---|
\(G = 10\) | \(\frac{10}{33}\) | \(\frac{5}{33}\) | \(\frac{15}{33}\) |
\(G = 11\) | \(\frac{6}{33}\) | \(\frac{4}{33}\) | \(\frac{10}{33}\) |
\(G = 12\) | \(\frac{7}{33}\) | \(\frac{1}{33}\) | \(\frac{8}{33}\) |
Total | \(\frac{23}{33}\) | \(\frac{10}{33}\) | \(\frac{33}{33}\) |