Conditional Expectations and Inequalities
Last modified — 21 Jun 2026
By the end of this lecture, students are anticipated to be able to:
If \(X\) and \(Y\) are two random variables, then the conditional expectation of \(X\) given \(Y = y\) is \[\begin{aligned} \mathbb{E}\left[ X | Y= y \right] &= \int_{-\infty }^{\infty } x f_{X|Y}\left( x | y \right) \mathsf{d}x, & \mathbb{E}\left[ X | Y= y \right] &= \sum_{x} x p_{X|Y}(x|y). \end{aligned}\]
Let \(X\) and \(Y\) be random variables with joint density \(f(x, y) = 2\) for \(0 < x < 1, 0 < y < x\). What is the conditional expectation of \(Y\) given \(X\)?
If \(X\) and \(Y\) are two random variables, then the conditional variance of \(X\) given \(Y = y\) is \[\begin{aligned} \operatorname{Var}(X | Y = y ) &= \int_{-\infty }^{\infty } ( x - \mathbb{E}[X|Y=y] ) ^{2}f_{X|Y}\left( x|y \right) \mathsf{d}x,\\ \operatorname{Var}(X | Y = y ) &= \sum_{x} ( x - \mathbb{E}[X|Y=y] ) ^{2}p_{X|Y}(x|y). \end{aligned}\]
Let \(X\) and \(Y\) be random variables with joint density \(f(x, y) = 2\) for \(0 < x < 1, 0 < y < x\). What is the conditional variance of \(Y\) given \(X\)?
Try this on your own.
Sometimes we are directly given information about the conditional distribution. If this is a “known” distribution, we can just use the properties of that distribution.
What are \(\mathbb{E}[Y|\Theta=\theta]\) and \(\operatorname{Var}(Y|\Theta=\theta)\)?
Because the conditional expecation follows \({\mathrm{Binom}}(n, \theta)\), we know:
This is much easier than finding the PMF/PDF of Y.
Important
Quick knowledge check. Are conditional expectations and variances random variables?
But there are some additional properties as well because \(\mathbb{E}[X|\Theta]\) and \(\operatorname{Var}[X|\Theta]\) are themselves random variables, and have their own distributions.
Let \(\Theta \sim {\mathrm{Unif}}(0, 1), \text{ and } Y | \Theta = \theta \sim {\mathrm{Binom}}(n, \theta)\)
We refer to this general setup as a hierarchical model.
Find the distribution of \(W = \mathbb{E}[X|\Lambda]\) and \(\mathbb{E}[W]\).
Using the definition of the joint distribution of \(X\) and \(\Lambda\), we can show that \[\begin{aligned} f_{X,\Lambda}(x, \lambda) &= f_{X|\Lambda}(x|\lambda) f_{\Lambda}(\lambda) \\ &= \frac{1}{\lambda} e^{-x/\lambda} \cdot \frac{1}{\Gamma(1)} \lambda e^{-\lambda} I_{[0,\infty)}(x)I_{[0,\infty)}(\lambda) \\ &= e^{-x/\lambda}e^{-\lambda} I_{[0,\infty)}(x)I_{[0,\infty)}(\lambda). \end{aligned}\]
Using our definition of Expectation, we can find \(\mathbb{E}[X]\): \[\begin{aligned} \mathbb{E}[X] &= \int_0^\infty \int_0^\infty x e^{-x/\lambda}e^{-\lambda} I_{[0,\infty)}(x)I_{[0,\infty)}(\lambda) \mathsf{d}\lambda \mathsf{d}x = \cdots. \end{aligned}\]
That is: \[\mathbb{E}[X] = \mathbb{E}[W] = \mathbb{E}[\mathbb{E}[X|\Lambda]].\]
Let \(X\) and \(Y\) be two random variables. Then, \[\mathbb{E}[X] = \mathbb{E}[\mathbb{E}[X|Y]]\] and \[\operatorname{Var}(X) = \mathbb{E}[\operatorname{Var}(X|Y)] + \operatorname{Var}(\mathbb{E}[X|Y]).\]
Let \(X\) and \(U\) be random variables such that \(U \sim Unif(0. 1)\), and \(\mathbb{E}[X\ \vert\ U] = 3U^2\). Find \(\mathbb{E}[X]\).
Let \(X\) be a random variable with \(\mathbb{P}(X \geq 0) = 1\). Then, for any \(a > 0\), \[\mathbb{P}( X \geq a ) \le \frac{\mathbb{E}[ X ] }{a}.\]
Note that this implies that for any random variable \(Y\),
\[\mathbb{P}(|Y| \geq a) \le \mathbb{E}[|Y|] / a.\]
Let \(Z = a I_{[a,\infty)}(X)\). We have that \(Z \leq X\) almost surely, and hence \(\mathbb{E}[Z] \leq \mathbb{E}[X]\) by monotonicity of expectation. But \[\begin{aligned} \mathbb{E}[X] &\geq \mathbb{E}[Z]\\ &= a \mathbb{P}(Z = a) + 0 \mathbb{P}(Z = 0)\\ & = a \mathbb{P}(Z = a)\\ &= a\mathbb{P}(X \geq a). \end{aligned}\]
Let \(X\) be a random variable with finite mean \(\mu\).
Then, for any \(a > 0\), \[\mathbb{P}(| X - \mu| \geq a ) \leq \ \frac{\operatorname{Var}(X)}{a^2}.\]
We have \[\begin{aligned} \mathbb{P}( |X - \mu| \geq a) &=\mathbb{P}( (X - \mu) ^2 \geq a^2) \\ &\leq \frac{\mathbb{E}[(X - \mu) ^2 ] }{a^{2}} & \textrm{(by Markov's ineq.)}\\ &=\frac{\operatorname{Var}(X)}{a^{2}}. \end{aligned}\]
Let \(X\) be a non-negative random variable with mean \(\mu\) and variance \(\mu\).
We want to examine the bounds on \(\mathbb{P}((X - \mu)/\mu \geq 1)\) given by Markov’s and Chebyshev’s inequalities.
Markov’s inequality gives \[\mathbb{P}((X - \mu)/\mu \geq 1) = \mathbb{P}( X \geq 2\mu ) \leq \ \frac{\mathbb{E}[X]}{2\mu} = \frac{\mu}{2\mu} = \frac{1}{2}.\]
Chebyshev’s inequality gives \[\mathbb{P}((X - \mu)/\mu \geq 1) = \mathbb{P}(X - \mu \geq \mu) \leq \mathbb{P}(|X - \mu| \geq \mu) \leq \frac{\operatorname{Var}(X)}{\mu^2} = \frac{\mu}{\mu^2} = \frac{1}{\mu}.\]
So for any random variable with mean \(\mu\) and variance \(\mu\), Chebyshev’s inequality gives a tighter bound whenever \(\mu > 2\).
Suppose you flip a fair coin 100 times. Use Markov’s and Chebyshev’s inequalities to approximate the probability of seeing 60 or more heads.
How many measurements should the astronomer make if they want the probability of a mismeasurement larger than 1 light year to be no more than 0.01?
Hint: recall that \(\mathbb{E}[ \overline{X}_n ] = \mathbb{E}[X_1]\) and \(\operatorname{Var}( \overline{X}_n ) = \operatorname{Var}(X_1)/n\).
Cauchy Schwarz for random variables
Let \(X\) and \(Y\) be two random variables with finite second moments. Then, \[|\mathbb{E}[XY]| \leq \sqrt{\mathbb{E}[X^2] \,\mathbb{E}[Y^2]}.\]
Let \(X\) and \(Y\) be two random variables with finite second moments. Then, \[|\operatorname{Cov}(X, Y)| \leq \sqrt{\operatorname{Var}(X) \, \operatorname{Var}(Y)}.\]
If, in addition, \(\operatorname{Var}(X) > 0\) and \(\operatorname{Var}(Y) > 0\), then \[|\operatorname{Corr}(X, Y)| \leq 1.\]
Recall that a function \(f: {\mathbb{R}}\to {\mathbb{R}}\) is convex if for any \(x, y \in {\mathbb{R}}\) and \(\lambda \in [0, 1]\), we have \[f(\lambda x + (1 - \lambda) y) \leq \lambda f(x) + (1 - \lambda) f(y).\]
Let \(X\) be a random variable with finite mean and let \(f: {\mathbb{R}}\to {\mathbb{R}}\) be a convex function. Then, \[f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)].\]
Your friend offers to play the following game with you.
How many times should you play this game? Justify your answer.
Show that the variance of a random variable is always non-negative.
Stat 302 - Winter 2025/26