Last modified — 24 Oct 2025
Suppose \(X_1\) and \(X_2\) are two random variables (defined on the same sample space \(\Omega\)).
We can study them separately (e.g. their CDF’s \(F_{X_1}(a)\) and \(F_{X_2}(t)\), their expected values \(\mathbb{E}[X_1]\), etc.)
But if we study them jointly we may be able to explore their relationships, if there is any (e.g. we may be able to use one of them to predict the other, etc.)
The joint CDF of the random vector \((X_1, X_2)\) is \[F_{(X_1, X_2)} ( a, b ) = \mathbb{P}( X_1 \le a, \, \ X_2 \le b)\]
The set \(\left\{ X_1 \le x_1, \ X_2 \le x_2 \right\}\) is \[\left\{ X_1 \le x_1 \, , \ X_2 \le x_2 \right\} = \left\{ X_1 \le x_1 \right\} \cap \left\{ X_2 \le x_2 \right\}\]
\(F_{(X_1, X_2)} ( s, t)\) is a non-decreasing function of \(s\) and \(t\).
Moreover \[ \lim_{a \to -\infty} F_{(X_1, X_2)}(a, x_2) = 0 \quad \forall x_2 \in \mathbb{R}\] and \[ \lim_{b \to -\infty} F_{(X_1, X_2)}(x_1, b) = 0 \quad \forall x_1 \in \mathbb{R}\]
Also \[ \lim_{a \to \infty, \ b \to \infty} F_{(X_1, X_2)}(a, b) = 1 \]
Suppose \(X_1\) and \(X_2\) are both discrete with ranges being \({\mathcal R}_1\) and \({\mathcal R}_2\).
The joint range of \({\mathbf X} = (X_1, X_2)^\mathsf{T}\) is then (a subset of) \({\mathcal R}_1 \otimes {\mathcal R}_2\) (their Cartesian product), which will also be finite or countable.
The joint PMF of \(X_1\) and \(X_2\) is \[f_{(X_1, X_2)} ( k_1, k_2) = \mathbb{P}( X_1 = k_1 \,, \ X_2 = k_2)\]
Recall that the CDF of \(X_1\) is \[F_{X_1}(x) = \mathbb{P}(X_1 \leq x)\] Since \(\mathbb{P}( X_2 < \infty) = 1\) we have \(\mathbb{P}(X_1 \leq x) = \mathbb{P}(X_1\leq x, X_2 < \infty)\) (prove it!) and thus \[F_{X_1}(x) = \mathbb{P}(X_1\leq x, X_2 < \infty) = \lim_{a \to \infty} F_{(X_1, X_2)}(x, a)\] Similarly, \(F_{X_2}(x) = \lim_{a \to \infty} F_{(X_1, X_2)}(a, x)\).
\(F_{X_1}\) and \(F_{X_2}\) are called the marginal CDFs of \(X_1\) and \(X_2\) respectively.
The PMF of \(X_1\) (or of \(X_2\)) can be derived from the joint PMF.
Note that \(\Omega = \{ X_2 \in {\cal R}_2 \}\) and thus for any \(k_1 \in {\cal R}_1\) we have
\[ f_{X_1}(k_1) = \mathbb{P}( X = k_1 ) = \mathbb{P}\left( (X = k_1) \cap (X_2 \in {\cal R}_2) \right) \] also
\[ \{ X_2 \in {\cal R}_2 \} \, = \, \bigcup_{b \in {\cal R}_2} \{ X_2 = b \} \] hence
\[ \Bigl\{ X = k_1 \Bigr\} \cap \Bigl\{ X_2 \in {\cal R}_2 \Bigr\} = \biggl\{ X = k_1 \biggr\} \cap \left( \bigcup_{b \in {\cal R}_2} \{ X_2 = b \} \right) = \bigcup_{b \in {\cal R}_2} \Bigl( \{ X_1 = k_1 \} \cap \{ X_2 = b \} \Bigr) \]
\[ \begin{aligned} f_{X_1}(k_1) &= \mathbb{P}( X = k_1 ) = \mathbb{P}\left( \bigcup_{b \in {\cal R}_2} \left( \{ X_1 = k_1 \} \cap \{ X_2 = b \} \right) \right) \\ & \\ &= \sum_{b \in {\cal R}_2} \mathbb{P}( X_1 = k_1, \, X_2 = b ) = \sum_{b \in \mathcal{R}_{2}} f_{(X_1, X_2)} ( k_1, b ) \end{aligned} \]
\[f_{X_2}(k_2) = \sum_{a \in \mathcal{R}_{1}} f_{(X_1, X_2)} ( a, k_2 )\]
Let \(X\) be a discrete random variable with PMF \(f_X(a)\) and range \({\cal R}_X\)
For any subset \(B \subset {\cal R}_X\) we have
\[ \mathbb{P}\left( X \in B \right) \, = \, \sum_{b \in B} \, f_X(b) \]
\[ \mathbb{P}\left( (X, Y) \in A \right) \, = \, \sum_{(a, b) \in A} \, f_{(X,Y)}(a, b) \]
| \(k\) | \(f_X(k)\) | \(f_Y(k)\) |
|---|---|---|
| 1 | 11/36 | 1/36 |
| 2 | 9/36 | 3/36 |
| 3 | 7/36 | 5/36 |
| 4 | 5/36 | 7/36 |
| 5 | 3/36 | 9/36 |
| 6 | 1/36 | 11/36 |
The joint PMF of \(X\) and \(Y\) is
| \(f(k_1, k_2)\) | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| 1 | 1/36 | 2/36 | 2/36 | 2/36 | 2/36 | 2/36 |
| 2 | 0 | 1/36 | 2/36 | 2/36 | 2/36 | 2/36 |
| 3 | 0 | 0 | 1/36 | 2/36 | 2/36 | 2/36 |
| 4 | 0 | 0 | 0 | 1/36 | 2/36 | 2/36 |
| 5 | 0 | 0 | 0 | 0 | 1/36 | 2/36 |
| 6 | 0 | 0 | 0 | 0 | 0 | 1/36 |
Calculate \(f_X(3)\) and \(f_Y(5)\) using the joint PMF above and check that it coincides with the table on the previous page
What is \(\mathbb{P}( 2 X > Y)\)?
\[f ( k_1, k_2, \ldots, k_n ) = \mathbb{P}( X_1 = k_1, X_2 = k_2, \ldots, X_n = k_n) \, , \quad ( k_1, k_2, \ldots, k_n ) \in {\cal R}_{(X_1, \ldots, X_n)}\]
\[F ( k_1, k_2, \ldots, k_n) = \mathbb{P}( X_1 \le k_1, X_2 \le k_2, \ldots, X_n \le k_n) \, , \quad ( k_1, k_2, \ldots, k_n ) \in \mathbb{R}^n\]
In other words: for any \(2 \le k \le n\) and \(1 \le i_1 < i_2 < \ldots < i_k \le n\) we have
\[\mathbb{P}( X_{i_1} \in A_{i_1}, X_{i_2} \in A_{i_2}, \ldots, X_{i_k} \in A_{i_k}) = \mathbb{P}( X_{i_1} \in A_{i_1}) \mathbb{P}(X_{i_2} \in A_{i_2}) \ldots \mathbb{P}( X_{i_k} \in A_{i_k}) \]
A necessary and sufficient condition for random variables \(X_1\), \(X_2\), , \(X_n\) to be independent is that for ALL \(k_1, k_2, \ldots, k_n\), where each \(k_j \in \mathbb{R}\), their joint and marginal CDFs satisfy
\[F_{(X_1, \ldots, X_n)} ( k_1, k_2, \ldots, k_n ) = F_{X_1}(k_1) \times F_{X_2}(k_2) \times \ldots \times F_{X_n} (k_n)\]
When \(X_1, X_2, \dots, X_n\), are discrete, a necessary and sufficient condition is that for ALL \(k_1, k_2, \ldots, k_n\), where the vector \((k_1, \ldots, k_n) \in {\cal R}_{(X_1, \ldots, X_n)}\), their joint and marginal PMFs satisfy
\[f_{(X_1, \ldots, X_n)}( k_1, k_2, \ldots, k_n ) = f_{X_1}(k_1) \times f_{X_2}(k_2) \times \ldots \times f_{X_n} (k_n)\]
\[f( k_1, k_2) =
\begin{cases} \frac{1}{36} & k_1, k_2 \in \{1, 2, \dots, 6\}\\ 0 & \text{else.}
\end{cases}\]
- Note that
\[{\cal R}_{(X_1, X_2)} = \left\{ 0, 1, \ldots, 6 \right\} \times \left\{ 0, 1, \ldots, 6 \right\} = {\cal R}_{X_1} \times {\cal R}_{X_2}\]
Now compute the marginal PMFs \[\begin{aligned} f_1 ( k_1 ) & = \sum_{k_2=1}^6 f ( k_1, k_2 )\\ & = \sum_{k_2=1}^6 1/36 \\ & = \begin{cases} 1 / 6 & k_1\in\{1,\dots, 6\}\\ 0 & \text{else}.\end{cases} \end{aligned}\]
Similarly \[f_2 ( k_2 ) = \begin{cases} 1 / 6 & k_1\in\{1,\dots, 6\}\\ 0 & \text{else}.\end{cases}\]
Thus for any \(k_1, k_2 \in \{1,\dots,6\}\): \[f ( k_1, k_2 ) = \frac{1}{36} = \frac{1}{6} \times \frac{1}{6} = f_1 ( k_1 ) f_2 ( k_2 )\]
For any \(k_1, k_2 \not\in \{1,\ldots,6\}\), we have \[f ( k_1, k_2 ) = 0 = f_1 ( k_1 ) f_2 ( k_2 )\]
Hence \(X_1\) and \(X_2\) are independent.
In a bag of 5 transistors, 2 are defective
Transistors will be tested until the 2 defective ones are identified
Let \(X_{1}\) the number of tests made until the first defective is identified
Let \(X_{2}\) the number of tests until the second defective is identified.
Find the joint PMF of \(X_{1}\) and \(X_{2}\)
We have \(1 \le X_1 < X_2 \le 5\)
Let \(D\) denote “defective”, and \(N\) mean “not defective”. Then the events corresponding to each combination \((X_1, X_2)\) are
| \(x_1 \backslash x_2\) | 2 | 3 | 4 | 5 |
|---|---|---|---|---|
| 1 | DD | DND | DNND | DNNND |
| 2 | - | NDD | NDND | NDNND |
| 3 | - | - | NNDD | NNDND |
| 4 | - | - | - | NNNDD |
Thus, their join PMF is given by \[f (x_1 = 1, x_2 = 2 ) = \mathbb{P}( \{DD\} ) = \mathbb{P}( D_2 \ \vert\ D_1 ) \mathbb{P}( D_1 ) = \frac{1}{4} \times \frac{2}{5} = 0.10\] where \(D_1\) is the event that the first tested item was “D”, etc.
Similarly: \[\begin{aligned} f (2, 4) &= \mathbb{P}( NDND )\\ &= \mathbb{P}( D_4 \ \vert\ N_1 D_2 N_3 ) \mathbb{P}( N_3 \ \vert\ N_1 D_2 ) \mathbb{P}( D_2 \ \vert\ N_1 ) \mathbb{P}( N_1 ) \\ &= \frac12 \times \frac23 \times \frac24 \times \frac35 = 0.10 \end{aligned}\]
etc.
| \(\mathbb{P}(X_1 = x_1, X_2 = x_2)\) | 2 | 3 | 4 | 5 |
|---|---|---|---|---|
| 1 | 0.1 | 0.1 | 0.1 | 0.1 |
| 2 | 0 | 0.1 | 0.1 | 0.1 |
| 3 | 0 | 0 | 0.1 | 0.1 |
| 4 | 0 | 0 | 0 | 0.1 |
The marginal PMFs:
| \(k\) | \(f_1(X_1 = k)\) | \(f_2(X_2 = k)\) |
|---|---|---|
| 1 | 0.4 | 0 |
| 2 | 0.3 | 0.1 |
| 3 | 0.2 | 0.2 |
| 4 | 0.1 | 0.3 |
| 5 | 0 | 0.4 |
Are \(X_1\) and \(X_2\) independent?
Let \(g : \mathbb{R}^2 \to \mathbb{R}\) be a bivariate function and \(X_1\) and \(X_2\) be two RVs with joint PMF \(f(k_1, k_2)\).
For example: \[\begin{aligned} g ( k_1, k_2 ) &= k_1 + k_2 \\ g ( k_1, k_2 ) &= k_1 \times k_2 \\ g ( k_1, k_2 ) &= \exp\{ 2 \, (k_1 + k_2) \} = e^{2 \, (k_1 + k_2)} \end{aligned}\]
Then, we can calculate the expectation of this function using the natural formula: \[\mathbb{E}[ g( X_1, X_2 ) ] = \sum_{k_{1}\in \mathcal{R}_{1}} \sum_{k_{2}\in \mathcal{R}_{2}} g( k_{1},k_{2}) f( k_{1}, k_{2}),\] where \(\mathcal{R}_{1}\) and \(\mathcal{R}_{2}\) are ranges of \(X_1\) and \(X_2\).
| \(X_1 = x_1 \backslash X_2 = x_2\) | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| 1 | 0.05 | 0.10 | 0.15 | 0.20 |
| 2 | 0.05 | 0.15 | 0.20 | 0.10 |
Let \(g(a, b) = a \, b\)
Compute \(\mathbb{E}[g ( X_1, X_2 )]\)
\[\begin{aligned} \mathbb{E}[ X_1 X_2 ] &= 1 \times 1 \times 0.05 + 1 \times 2 \times 0.05 \\ & \quad + 2 \times 1 \times 0.10 + 2 \times 2 \times 0.15 \\ & \quad + 3 \times 1 \times 0.15 + 3 \times 2 \times 0.20 \\ & \quad + 4 \times 1 \times 0.20 + 4 \times 2 \times 0.10 \\ & = 4.2. \end{aligned}\]
Applying \(\mathbb{E}[a +b \, X] = a + b \, \mathbb{E}[X]\) and \(\mathbb{E}[X+Y] = \mathbb{E}[X] + \mathbb{E}[Y]\), we get the above results.
Recall the joint PMF of \((X_1, X_2)\) given by
| \(X_1 = x_1 \backslash X_2 = x_2\) | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| 1 | 0.05 | 0.10 | 0.15 | 0.20 |
| 2 | 0.05 | 0.15 | 0.20 | 0.10 |
Calculate \[\mathbb{E}\left[ \left( 1 + 2 \, X_{1} + 3 \, X_{2} \right)^{2} \right]\]
There are many ways to calculate this value.
Note that \[(1+2X_{1}+3X_{2})^2= 1+4X_{1}+6X_{2}+4X_{1}^{2}+9X_{2}^{2}+12X_{1}X_{2}.\]
Making use of linearity, we may compute the expectation of each of these terms and add the results.
We already have \(\mathbb{E}[X_{1}X_{2}] =4.2\).
| \(k_{1}\) | \(f_{1}(k_{1})\) | \(k_{1}f_{1}(k_{1})\) | \(k_{1}^{2}f_{1}(k_{1})\) |
|---|---|---|---|
| 1 | 0.10 | 0.10 | 0.10 |
| 2 | 0.25 | 0.50 | 1.00 |
| 3 | 0.35 | 1.05 | 3.15 |
| 4 | 0.30 | 1.20 | 3.80 |
| Total | 1.00 | 2.85 | 9.05 |
Therefore, \[\begin{aligned} \mathbb{E}[X_{1}]&= 2.85 \\ \mathbb{E}[X_{1}^{2}] &= 9.05 \end{aligned}\]
| \(k_{2}\) | \(f_{2}(k_{2})\) | \(k_{2}f_{2}(k_{2})\) | \(k_{2}^{2}f_{2}(k_{2})\) |
|---|---|---|---|
| 1 | 0.50 | 0.50 | 0.50 |
| 2 | 0.50 | 1.00 | 2.00 |
| Total | 1.00 | 1.50 | 2.50 |
Therefore, \[\begin{aligned}\mathbb{E}[X_{2}]&=1.5 \\ \mathbb{E}[X_{2}^{2}] &= 2.5\end{aligned}\]
So far, we have obtained \[\begin{aligned} \mathbb{E}[X_{1}X_{2}] &=4.2\\ \mathbb{E}[X_{1}] &=2.85 & E[X_{2}] &=1.5 \\ \mathbb{E}[X_{1}^{2}] &=9.05 & E[X_{2}^{2}] &= 2.5. \end{aligned}\]
Therefore, \[\begin{aligned} \mathbb{E}[(1+2X_{1}+3X_{2})^{2}] &= 1 + 4\mathbb{E}[X_{1}] + 6\mathbb{E}[X_{2}] + 4\mathbb{E}[X_{1}^{2}] + 9 \mathbb{E}[X_{2}^{2}] + 12\mathbb{E}[X_{1}X_{2}]\\ &= 1+4 \times 2.85 + 6\times 1.5 + 4\times 9.05 + 9\times 2.5 + 12\times 4.2 \\ &=130.5. \end{aligned}\]
Given two random variables, we might ask how they are related: for instance, a tall person tends to have a heavier body.
Let \(X_1\) and \(X_2\) be two random variables with \[\mathbb{E}[ X_1] = \mu_1 \qquad \text{ and } \qquad \mathbb{E}[ X_2 ] = \mu_2\]
If \(X_1\) and \(X_2\) are discrete, then \[\operatorname{Cov}(X_1, X_2) = \sum_{k_1 \in {\cal R}_{X_1}} \sum_{k_2 \in {\cal R}_{X_2}} (k_1 - \mu_1)(k_2 - \mu_2)f( k_1, k_2)\]
\[\operatorname{Cov}(X_1, X_2) = \mathbb{E}[X_1 X_2] - \mathbb{E}[X_1]\mathbb{E}[X_2].\]
Suppose that \[\begin{aligned} X_{1} &=\text{Time to complete task 1 (in days)} \\ X_{2} &=\text{Time to complete task 2 (in days)} \end{aligned}\] have their joint PMF given by
| \(X_1\backslash X_2\) | 1 | 2 | 3 | 4 | Total |
|---|---|---|---|---|---|
| 1 | 0.20 | 0.05 | 0.05 | 0 | 0.30 |
| 2 | 0.05 | 0.15 | 0.10 | 0.05 | 0.35 |
| 3 | 0 | 0.05 | 0.10 | 0.20 | 0.35 |
| Total | 0.25 | 0.25 | 0.25 | 0.25 | 1 |
It is easy to calculate \[\begin{aligned} \mathbb{E}[X_1] &= 1\times 0.30+2\times 0.35+3\times 0.35=2.05, \\ \mathbb{E}[X_2] &= 1\times 0.25+2\times 0.25+3\times 0.25+4\times 0.25 = 2.50. \end{aligned}\]
| \(X_1\backslash X_2\) | 1 | 2 | 3 | 4 | Total |
|---|---|---|---|---|---|
| 1 | 0.20 | 0.05 | 0.05 | 0 | 0.30 |
| 2 | 0.05 | 0.15 | 0.10 | 0.05 | 0.35 |
| 3 | 0 | 0.05 | 0.10 | 0.20 | 0.35 |
| Total | 0.25 | 0.25 | 0.25 | 0.25 | 1 |
We further have \[\begin{aligned} \mathbb{E}[X_{1}X_{2}] &= 1\times 1\times 0.20+1\times 2\times 0.05 +1\times 3\times 0.05\\ &\quad +1\times 4\times 0 + 2\times 1\times 0.05+2\times 2\times 0.15\\ &\quad +2\times 3\times 0.10+2\times 4\times 0.05 +3\times 1\times 0 \\ &\quad +3\times 2\times 0.05+3\times 3\times 0.10+3\times 4\times 0.20 \\ &= 5.75. \end{aligned}\]
So far, we have obtained \[\begin{aligned} \mathbb{E}[X_1] &= 2.05, \\ \mathbb{E}[X_2] &= 2.50, \\ \mathbb{E}[X_{1}X_{2}] &= 5.75. \end{aligned}\]
Therefore, \[\begin{aligned} \operatorname{Cov}(X_{1},X_{2}) &= \mathbb{E}[X_{1}X_{2}] - \mathbb{E}[X_{1}] \mathbb{E}[X_{2}] \\ &=5.75 - 2.05\times 2.50 \\ &= 0.625. \end{aligned}\]
When \(\operatorname{Cov}( X_{1},X_{2})\) is large and positive, the variables tend to be both above or both below their respective means simultaneously.
In other words, the two variables tend to move in the same direction (when one increases, so does the other, and vice versa).
The covariance is a measure of the “linear” association between the variables.
When \(\operatorname{Cov}(X_{1},X_{2})\) is large and negative one of the variables tends to be above its mean when the other is below its mean.
Variables in last example were given in days \[ \operatorname{Cov}( X_{1},X_{2}) = 0.625.\]
If the variables were given in hours instead: \[ \operatorname{Cov}( 24 X_{1}, 24 X_{2}) = 24^2 \times 0.625 = 360.\]
The covariance can be artificially increased or decreased by changing the units of the variables.
Not an ideal metric for the strength of the relationship.
\[\operatorname{Corr}(X_1, X_2) \, = \, \frac{ \operatorname{Cov}(X_1, X_2) }{ \sqrt{ \operatorname{Var}[ X_1 ] } \, \sqrt{ \operatorname{Var}[ X_2 ] } }\] provided that \(\operatorname{Var}[ X_1 ] \times \operatorname{Var}[ X_2 ] > 0\)
The linear correlation coefficient is scale invariant
In other words: for any \(b, d \in \mathbb{R}\):
The strength (magnitude) of the relationship doesn’t change
The direction (sign) might
In addition to being scale invariant, we have \[-1 \le \operatorname{Corr}( X_1, X_2 ) \le 1.\]
Hence, we have a unique “reference scale” to consider a correlation value to be “high” or “low”
Specific cutoffs / guidelines / thresholds often depend on the subject area
e.g. \(\rho=0.9\) may be low to a physicist, but extremely high to a sociologist
To prove this, we first show:
\[\operatorname{Var}( X_{1} \pm X_{2}) = \operatorname{Var}( X_{1})+ \operatorname{Var}( X_{2}) \pm 2 \operatorname{Cov}( X_{1},X_{2}).\]
\[\begin{aligned} \operatorname{Var}[ X_{1}+X_{2} ] &= \mathbb{E}\left[ \left( \left( X_{1}+X_{2} \right)- \left(\mu _{1} + \mu _{2} \right) \right)^{2} \right] \\ & \\ &= \mathbb{E}\left[\left( \left( X_{1}-\mu _{1}\right) + \left( X_{2}-\mu_{2}\right) \right)^{2} \right] \\ & \\ &= \mathbb{E}\left[ \left( X_{1}-\mu _{1} \right)^{2} + \left( X_{2}-\mu _{2} \right)^{2} +2 \left( X_{1}-\mu _{1} \right) \left( X_{2}-\mu _{2} \right) \right] \\ & \\ &= \operatorname{Var}[ X_{1}] + \operatorname{Var}[X_{2}] + 2 \operatorname{Cov}( X_{1},X_{2} ) \end{aligned}\]
Similarly, you should prove that
\[ \operatorname{Var}[ X_{1} - X_{2} ] = \operatorname{Var}[ X_{1}] + \operatorname{Var}[X_{2}] - 2 \operatorname{Cov}( X_{1},X_{2} ) \]
Let \(Y = X_2 - \beta \, X_1\). Choose \(\beta = \operatorname{Cov}(X_1, X_2) / \operatorname{Var}[X_1]\).
\[\begin{aligned} 0 \le \operatorname{Var}[Y] &= \sigma_2^2 + \beta^2 \sigma_1^2 - 2 \, \beta \, \operatorname{Cov}(X_1, X_2) \\ &= \sigma_2^2 + \frac{ \operatorname{Cov}(X_1, X_2)^2 }{ \sigma_1^2} - 2 \frac{ \operatorname{Cov}(X_1, X_2)^2}{ \sigma_1^2 } \\ &= \sigma_2^2 - \frac{ \operatorname{Cov}(X_1, X_2)^2}{ \sigma_1^2 }\\ &\Rightarrow \frac{\operatorname{Cov}(X_1, X_2)^2}{ \sigma_1^2 } \leq \sigma_2^2\\ &\Rightarrow \frac{\operatorname{Cov}(X_1, X_2)^2}{ \sigma_1^2\sigma_2^2 } \leq 1\\ &\Rightarrow \operatorname{Corr}(X_1, X_2)^2 \leq 1\\ & \\ &\Rightarrow |\operatorname{Corr}(X_1, X_2)| \leq 1 \end{aligned}\]
\[V = X + Y \qquad \text{ and } \qquad U = X - Y\]
We need \(\operatorname{Var}[V]\), \(\operatorname{Var}[U]\) and \(\operatorname{Cov}(U, V)\)
For \(\operatorname{Var}[V]\):
\[ \operatorname{Var}[V] = \operatorname{Var}[X + Y] = \operatorname{Var}[X] + \operatorname{Var}[Y] = 2 \times 35 / 12 = 35 / 6 \]
\[ \operatorname{Var}[U] = \operatorname{Var}[X - Y] = \operatorname{Var}[X] + \operatorname{Var}[Y] = 35 / 6 \]
\[\operatorname{Var}[V + U] = \operatorname{Var}[V] + \operatorname{Var}[U] + 2 \, \operatorname{Cov}(V, U) \]
\[\operatorname{Var}[V + U] = \operatorname{Var}[ 2 \, X ] = 4 \, \operatorname{Var}[X] = 35/3 \]
\[ \begin{aligned} \operatorname{Cov}(V, U) &= \frac{1}{2} \left( \operatorname{Var}[V + U] - \operatorname{Var}[V] - \operatorname{Var}[U] \right) \\ & \\ &= \frac{1}{2} \left( \frac{35}{3} - \frac{35}{6} - \frac{35}{6} \right) = 0 \end{aligned} \]
\[ \mathbb{P}\left( U = u \, , V = v \right) \ne \mathbb{P}\left( U = u \right) \, \mathbb{P}\left( V = v \right) \]
\[ \left\{ V = 2 \right\} \, = \, \left\{ X + Y = 2 \right\} \, = \, \left\{ X = 1, \, Y = 1 \right\} \]
\[ \left\{ V = 2 \right\} \, = \, \left\{ X = 1, \, Y = 1 \right\} \ \subset \ \left\{ U = 0 \right\} \]
\[ \left\{ V = 2, \, U = 0 \right\} = \left\{ V = 2 \right\} \cap \left\{ U = 0 \right\} = \left\{ V = 2 \right\} \] and \[ \mathbb{P}(V = 2, U = 0) \, = \, \mathbb{P}( V = 2 ) \, = \, \frac{1}{36} \]
\[\mathbb{P}(V = 2) \, \mathbb{P}(U = 0) = \frac{1}{36}\times\frac{1}{6} \, \ne \, \mathbb{P}(V = 2, U = 0)\]
Let \(f_{(X_1, X_2)}(k_1, k_2)\) be the PMF of the random vector\((X_1, X_2)\).
For points \(k_1\) with \(f_{X_1}(k_1) > 0\), we define the conditional PMF of \(X_2\) given \(X_1 = k_1\) as
\[\begin{aligned} f_{2|1} ( k_{2}\ \vert\ k_1 ) &=\mathbb{P}(X_{2}=k_{2} \ \vert\ X_{1}= k_1 ) \\ & \\ &=\frac{\mathbb{P}( X_1 = k_1, \, X_{2}=k_{2}) }{\mathbb{P}( X_{1}= k_1 ) } \\ & \\ &=\frac{f_{(X_1, X_2)}( k_1 , k_{2}) }{f_{X_1}( k_1 ) }. \end{aligned}\]
For each fixed \(k_1\) with \(f_{X_1}(k_1) > 0\), the function \(f_{2 | 1}( k_{2} \ \vert\ k_{1})\) is a PMF in \(k_2 \in {\cal R}_{X_2}\):
You can check that \(f_{2|1}(k_{2} \ \vert\ k_{1}) \ge 0\)
and also: \[\begin{aligned} \sum_{k_{2} \in \mathcal{R}_{X_2} } f_{2 | 1}( k_{2} \ \vert\ k_{1}) &= \sum_{k_{2} \in \mathcal{R}_2 }\frac{f_{(X_1, X_2)}( k_{1}, k_{2}) }{f_{X_1}( k_{1}) } \\ & \\ &= \frac{1}{f_{X_1}( k_1 ) } \sum_{k_{2} \in \mathcal{R}_2 }f_{(X_1, X_2)}( k_{1}, k_{2}) \\ & \\ &= \frac{1}{f_{X_1}( k_1 ) } \, f_{X_1}( k_1 ) = 1 \end{aligned}\]
Similarly: when \(f_2(k_2) > 0\) we define the PMF of \(X_1\) given \(X_2 = k_2\) to be: \[f_{1 | 2} ( k_{1} \ \vert\ k_{2}) =\frac{f( k_{1}, k_{2}) }{f_{2}( k_{2}) }.\]
For each \(k_2\), \(f_{1 | 2}( k_{1} \ \vert\ k_{2})\) is a PMF in \(k_1\).
\[V = X + Y \quad\quad \text{and} \quad\quad W = \max\{X, Y\}\] is:
| \(W\ \backslash\ V\) | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1/36 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 2/36 | 1/36 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 0 | 0 | 2/36 | 2/36 | 1/36 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 0 | 0 | 0 | 2/36 | 2/36 | 2/36 | 1/36 | 0 | 0 | 0 | 0 |
| 5 | 0 | 0 | 0 | 0 | 2/36 | 2/36 | 2/36 | 2/36 | 1/36 | 0 | 0 |
| 6 | 0 | 0 | 0 | 0 | 0 | 2/36 | 2/36 | 2/36 | 2/36 | 2/36 | 1/36 |
Conditional PMFs of \(W\) are the rows of the table. (divided by that row’s sum)
For example
\[ f_{W|V}(3|6) = \frac{ \mathbb{P}(W=3, V=6) }{ \mathbb{P}( V=6 )} = \frac{ 1/36 }{ 1/36 + 2/36 + 2/36} = 1/5 \]
\[f_{W|V}(w|6) = \begin{cases} 1/5 & w = 3 \\ & \\ 2/5 & w = 4, \, 5 \\ & \\ 0 & \text{else.}\end{cases}\]
\[\mathbb{E}[ W\ \vert\ V =v] = \sum_{w \in \mathcal{W}} w f_{W|V} ( w \ \vert\ v).\]
\[\mathbb{E}[W \ \vert\ V = 4] = 2\times \frac{1}{3}+3\times \frac{2}{3}=8/3\]
\[\mathbb{E}[W \ \vert\ V=7] =4\times \frac{1}{3}+5\times \frac{1}{3}+6\times \frac{1}{3}=5\]
\[\mathbb{E}[V \ \vert\ W = 1] = 2 \times 1 = 1\]
\[\begin{aligned} \operatorname{Var}[ W \ \vert\ V=v] &= \mathbb{E}\Big[\big( W - \mathbb{E}[ W \ \vert\ V=v ]\big)^2 \ \vert\ V = v \Big]\\ &= \sum_{w \in \mathcal{W}} \big(w - \mathbb{E}[W\ \vert\ V = v]\big)^2 f(w \ \vert\ v)\\ &= \mathbb{E}[W^2 \ \vert\ V = v] - \left( \mathbb{E}[W \ \vert\ V = v] \right)^2 \end{aligned}\] Prove the last equality
\[ \mathbb{E}\left( W | V = v \right) \]
\[ h( {\color{red} a} ) \, = \, \mathbb{E}\left( W | V = {\color{red} a} \right) \] - Note that
\[ h \, : \, \mathcal{R}_V \ \longrightarrow \ \mathbb{R} \]
| \(v\) | \(h(v) = \mathbb{E}\left( W | V = v \right)\) |
|---|---|
| 3 | 2.00 |
| 4 | 2.67 |
| 5 | 3.50 |
| 6 | 4.20 |
| 7 | 5.00 |
| 8 | 5.20 |
| 9 | 5.50 |
| 10 | 5.67 |
| 11 | 6.00 |
| 12 | 6.00 |
If we apply the function \(h\) to the random variable \(V\), we will get a new random variable: \(h(V)\)
Note that \[ h \left( V \right) \, = \, \mathbb{E}\bigl( \left. W \right| V \bigr) \]
In other words: \(\mathbb{E}\bigl( \left. W \right| V \bigr)\) is a random variable
which will have its own expected value and variance, for example
\[ \mathbb{E}\Bigl[ \, \mathbb{E}\bigl( \left. Y \right| X \bigr) \, \Bigr] \ = \ \sum_{k_1 \in \mathcal{R}_X} \, \mathbb{E}\bigl( \left. Y \right| X = k_1 \bigr) \, f_X(k_1) \, = \, \mathbb{E}\bigl[ Y \bigr] \]
\[ \mbox{Var} \Bigl( \mathbb{E}\bigl( \left. Y \right| X \bigr) \Bigr) \, + \, \mathbb{E}\Bigl( \mbox{Var} \bigl( Y | X \bigr) \Bigr) \, = \, \mbox{Var} \left( Y \right) \]
\[\begin{aligned} \mathbb{E}[ \, \mathbb{E}[Y \ \vert\ X] ] &= \mathbb{E}[h(X)] = \sum_{x \in \mathcal{R}_X} h(x) \mathbb{P}(X = x) \\ &= \sum_{x \in \mathcal{R}_X} \mathbb{E}[Y \ \vert\ X = x] f_X(x) \\ &= \sum_{x \in \mathcal{R}_X} \left(\sum_{y \in \mathcal{R}_Y} y f_{Y|X} (y\ \vert\ x) \right) f_X (x) \\ &= \sum_{x \in \mathcal{R}_X} \sum_{y \in \mathcal{R}_Y} \left(y f_{Y|X} (y\ \vert\ x) f_X (x)\right) = \sum_{x \in \mathcal{R}_X} \sum_{y \in \mathcal{R}_Y} y f_{X,Y}(x, y)\\ & = \sum_{y \in \mathcal{R}_Y} y \, \sum_{x \in \mathcal{R}_X} f_{X,Y}(x, y) = \sum_{y \in \mathcal{R}_Y} y \, f_{Y}(y) = \mathbb{E}[Y]. \end{aligned}\]
\[\operatorname{Var}[\mathbb{E}[Y \ \vert\ X ]] = \operatorname{Var}[h(X)]= \mathbb{E}[h(X)^2] - \mathbb{E}[h(X)]^2 = \mathbb{E}[h(X)^2] - \mathbb{E}[Y]^2\]
\[\mathbb{E}[\operatorname{Var}[ Y \ \vert\ X]] = \mathbb{E}\big[ \mathbb{E}[Y^2 \ \vert\ X] - \mathbb{E}[Y\ \vert\ X]^2\big]= \mathbb{E}[Y^2] - \mathbb{E}[h(X)^2]\]
\[\begin{aligned} \operatorname{Var}[\mathbb{E}[Y \ \vert\ X ]] + \mathbb{E}[\operatorname{Var}[ Y \ \vert\ X]] &= \mathbb{E}[h(X)^2] - \mathbb{E}[Y]^2 + \mathbb{E}[Y^2] - \mathbb{E}[h(X)^2] \\ & \\ &= \mathbb{E}[Y^2] - \mathbb{E}[Y]^2 = \operatorname{Var}[Y] \end{aligned}\]
| \(v\) | \(\mathbb{E}\left( W | V = v \right)\) | \(f_V(v)\) |
|---|---|---|
| 2 | 1.00 | 1/36 |
| 3 | 2.00 | 2/36 |
| 4 | 2.67 | 3/36 |
| 5 | 3.50 | 4/36 |
| 6 | 4.20 | 5/36 |
| 7 | 5.00 | 6/36 |
| 8 | 5.20 | 5/36 |
| 9 | 5.50 | 4/36 |
| 10 | 5.67 | 3/36 |
| 11 | 6.00 | 2/36 |
| 12 | 6.00 | 1/36 |
\[ \begin{aligned} \mathbb{E}[ \, \mathbb{E}[ W | V] ] &= 1 \times 1/36 + 2 \times 2/36 \\ & \qquad + 2.67 \times 3 / 36 + \ldots + \\ & \qquad + 6 \times 2/36 + 6 \times 1/36 = 4.472778 \end{aligned} \]
\[ \begin{aligned} \mathbb{E}[ W ] &= 1 \times 1/36 + 2 \times 3/36 + \\ & \qquad 3 \times 5/36 + \ldots = 4.472778 \end{aligned} \]
Suppose we observe \(X\) and want to predict \(Y\)
Does it sound familiar?
The prediction should be based on \(X\)
In other words, it should be a function of \(X\): \(g(X)\)
The question is: what is the best function \(g\) we can use?
Well… first: what do we mean by “best”?
\[ \arg \min_{g \in {\cal G}} \, \mathbb{E}\left[ \left( Y - g \left( X \right) \right)^2 \right] \]
over a class of functions \({\cal G} = \left\{ g: \mathbb{R} \to \mathbb{R} \right\}\)
\[ \mathbb{E}\left[ \, \left( Y - \mathbb{E}\left( Y | X \right) \right)^{2} \, \right] \, \le \, \mathbb{E}\left[ \, \left( Y - g\left( X \right) \right) ^{2} \, \right] \]
\[ \begin{aligned} \mathbb{E}\left[ \left( Y-g\left( X \right) \right) ^{2}\right] &= \mathbb{E}\left\{ \mathbb{E}\left. \left[ \left( Y-g\left( X \right) \right) ^{2} \right| X \right] \right\} \\ & \\ & = \mathbb{E}\left\{ \mathbb{E}\left. \Bigl[ \Bigl( Y - \mathbb{E}\left( Y | X \right) + \mathbb{E}\left( Y | X \right) - g\left( X \right) \Bigr) ^{2} \right| X \Bigr] \right\} \\ & \\ &= \mathbb{E}\left\{ \mathbb{E}\left[ \left. \Bigl( Y - \mathbb{E}\left( Y | X \right) \Bigr)^2 \right| X \right] \right\} \\ & \qquad + \mathbb{E}\left\{ \mathbb{E}\left[ \left. \Bigl( \mathbb{E}\left( Y | X \right) - g(X) \Bigr)^2 \right| X \right] \right\} \\ & \qquad + 2 \, \mathbb{E}\Bigl\{ \mathbb{E}\Bigl[ \Bigl. \bigl\{ Y - \mathbb{E}\left( Y | X \right) \bigr\} \, \bigl\{ \mathbb{E}\left( Y | X \right) - g(X) \bigr\} \Bigr| X \Bigr] \Bigr\} \end{aligned} \]
\[ \mathbb{E}\left\{ \Bigl( \mathbb{E}\left( Y | X \right) - g(X) \Bigr)^2 \right\} \ge 0 \]
the last equation on the last page is
\[ \ge \quad \mathbb{E}\left[ \Bigl( Y - \mathbb{E}\left( Y | X \right) \Bigr)^2 \right] + 2 \, \mathbb{E}\Bigl\{ \Bigl. \bigl\{ \mathbb{E}\left( Y | X \right) - g(X) \bigr\} \ \mathbb{E}\Bigl[ \bigl\{ Y - \mathbb{E}\left( Y | X \right) \bigr\} \, \Bigr| X \Bigr] \Bigr\} \]
\[ \mathbb{E}\Bigl[ \bigl\{ Y - \mathbb{E}\left( Y | X \right) \bigr\} \, \Bigr| X \Bigr] = \mathbb{E}\left( Y | X \right) - \mathbb{E}\left( Y | X \right) \, = 0 \]
so
\[ \mathbb{E}\Bigl\{ \bigl\{ \mathbb{E}\left( Y | X \right) - g(X) \bigr\} \, \mathbb{E}\Bigl[ \Bigl. \bigl\{ Y - E \left( Y | X \right) \bigr\} \, \Bigr| X \Bigr] \Bigr\} = 0 \]
\[ \mathbb{E}\left[ \left( Y -g\left( X \right) \right)^{2}\right] \ \ge \ \mathbb{E}\left[ \left( Y - \mathbb{E}\left( Y | X \right) \right)^2 \right] \]
\[\begin{aligned} X &=\text{\# of daily visits to a website},\\ Y &=\text{\# daily sales made on the website}. \end{aligned}\]
\[\begin{aligned} X &\sim {\mathrm{Poiss}}( \lambda ) & \lambda &> 0,\\ Y \ \vert\ X &\sim {\mathrm{Binom}}( X,\ p) &p &\in [0, 1]. \end{aligned}\]
\[\mathbb{E}[X] = \lambda, \qquad \operatorname{Var}[X] = \lambda\]
\[\mathbb{E}\left[ Y\ \vert\ X \right] = X \, p, \qquad \operatorname{Var}\left[ Y\ \vert\ X \right] = X \, p \, (1-p)\]
\[\mathbb{E}[Y] = \mathbb{E}[ \mathbb{E}[Y \ \vert\ X]] = \mathbb{E}[ X \, p] \, = \, p \, \mathbb{E}[X] = p \, \lambda\]
\[\begin{aligned} \operatorname{Var}[Y] &=\mathbb{E}\left[ \operatorname{Var}[Y\ \vert\ X] \right] + \operatorname{Var}\left[ \mathbb{E}[ Y \ \vert\ X] \right] \\ & \\ &=\mathbb{E}\left[ X \, p \, ( 1-p) \right] + \operatorname{Var}\left[ X\, p \right] \\ & \\ &=p(1-p) \mathbb{E}[X] +p^{2}\operatorname{Var}[X] \\ & \\ &=p(1-p) \lambda +p^{2}\lambda \\ & \\ &=\lambda \, \left[ p \, (1-p) + p^{2} \right] \\ & \\ &=\lambda \, p \end{aligned}\]
Stat 302 - Winter 2025/26