Lecture 14

Conditional Expectations and Inequalities

Grace Tompkins

Last modified — 21 Jun 2026

Learning Outcomes

By the end of this lecture, students are anticipated to be able to:

Calculate conditional expectations from conditional and joint distributions
Use inequalities to find bounds of expectations and variances

1 Conditional Expectation

Conditional Expectation

If $X$ and $Y$ are two random variables, then the conditional expectation of $X$ given $Y = y$ is \[\begin{aligned} \mathbb{E}\left[ X | Y= y \right] &= \int_{-\infty }^{\infty } x f_{X|Y}\left( x | y \right) \mathsf{d}x, & \mathbb{E}\left[ X | Y= y \right] &= \sum_{x} x p_{X|Y}(x|y). \end{aligned}\]

This is the same definition we saw previously for expectation, just with the conditional distribution.

Conditional Expectation

Let $X$ and $Y$ be random variables with joint density $f(x, y) = 2$ for $0 < x < 1, 0 < y < x$. What is the conditional expectation of $Y$ given $X$?

Conditional Expectation

Conditional Variance

If $X$ and $Y$ are two random variables, then the conditional variance of $X$ given $Y = y$ is \[\begin{aligned} \operatorname{Var}(X | Y = y ) &= \int_{-\infty }^{\infty } ( x - \mathbb{E}[X|Y=y] ) ^{2}f_{X|Y}\left( x|y \right) \mathsf{d}x,\\ \operatorname{Var}(X | Y = y ) &= \sum_{x} ( x - \mathbb{E}[X|Y=y] ) ^{2}p_{X|Y}(x|y). \end{aligned}\]

Conditional Variance

Let $X$ and $Y$ be random variables with joint density $f(x, y) = 2$ for $0 < x < 1, 0 < y < x$. What is the conditional variance of $Y$ given $X$?

Try this on your own.

Conditional Variance

Conditional Expectation and Variance

Sometimes we are directly given information about the conditional distribution. If this is a “known” distribution, we can just use the properties of that distribution.

Let $\Theta \sim {\mathrm{Unif}}(0, 1)$
Let $Y | \Theta = \theta \sim {\mathrm{Binom}}(n, \theta)$

What are $\mathbb{E}[Y|\Theta=\theta]$ and $\operatorname{Var}(Y|\Theta=\theta)$?

Because the conditional expecation follows ${\mathrm{Binom}}(n, \theta)$, we know:

$\mathbb{E}[Y|\Theta=\theta] = n\theta$ and
$\operatorname{Var}(Y|\Theta=\theta) = n\theta(1-\theta)$.

This is much easier than finding the PMF/PDF of Y.

Conditional Expectation and Variance

Important

Quick knowledge check. Are conditional expectations and variances random variables?

Conditional Expectation and Variance

The properties of expectation and variance that we have seen before also hold for conditional expectation and variance.

But there are some additional properties as well because $\mathbb{E}[X|\Theta]$ and $\operatorname{Var}[X|\Theta]$ are themselves random variables, and have their own distributions.

Let $\Theta \sim {\mathrm{Unif}}(0, 1), \text{ and } Y | \Theta = \theta \sim {\mathrm{Binom}}(n, \theta)$

$W = \mathbb{E}[Y|\Theta] = n\Theta$ is a random variable that depends on $\Theta$.
But $\Theta \sim {\mathrm{Unif}}(0, 1)$, so $W \sim {\mathrm{Unif}}(0, n)$!
Using the Jacobian method, we can show that the PDF of $V = \operatorname{Var}(Y|\Theta) = n\Theta(1-\Theta)$ is given by \[f_V(v) = \frac{2}{n\sqrt{1 - 4v/n}}, \quad 0 < v < n/4.\]

Hierarchical Models

We refer to this general setup as a hierarchical model.

We first draw $\Theta$ from some distribution.
Then we draw $Y$ from a distribution that depends on $\Theta$.
We can find the distribution of $Y | \Theta$ as well as those of its expectation.

Let $\Lambda \sim {\mathrm{Gam}}(1, 2)$.
Let $X | \Lambda \sim {\mathrm{Exp}}(1/\Lambda)$.

Find the distribution of $W = \mathbb{E}[X|\Lambda]$ and $\mathbb{E}[W]$.

Hierarchical Models

Law of Total Expectation

Using the definition of the joint distribution of $X$ and $\Lambda$, we can show that \[\begin{aligned} f_{X,\Lambda}(x, \lambda) &= f_{X|\Lambda}(x|\lambda) f_{\Lambda}(\lambda) \\ &= \frac{1}{\lambda} e^{-x/\lambda} \cdot \frac{1}{\Gamma(1)} \lambda e^{-\lambda} I_{[0,\infty)}(x)I_{[0,\infty)}(\lambda) \\ &= e^{-x/\lambda}e^{-\lambda} I_{[0,\infty)}(x)I_{[0,\infty)}(\lambda). \end{aligned}\]

Using our definition of Expectation, we can find $\mathbb{E}[X]$: \[\begin{aligned} \mathbb{E}[X] &= \int_0^\infty \int_0^\infty x e^{-x/\lambda}e^{-\lambda} I_{[0,\infty)}(x)I_{[0,\infty)}(\lambda) \mathsf{d}\lambda \mathsf{d}x = \cdots. \end{aligned}\]

That is: \[\mathbb{E}[X] = \mathbb{E}[W] = \mathbb{E}[\mathbb{E}[X|\Lambda]].\]

Law of Total Expectation and Variance (Tower Property)

Let $X$ and $Y$ be two random variables. Then, \[\mathbb{E}[X] = \mathbb{E}[\mathbb{E}[X|Y]]\] and \[\operatorname{Var}(X) = \mathbb{E}[\operatorname{Var}(X|Y)] + \operatorname{Var}(\mathbb{E}[X|Y]).\]

The first equation shows what we just saw, but it is general and holds for any $X$ and $Y$.
The second equation is a bit more complicated, but it is also very useful.
It shows that the variance of $X$ can be decomposed into two parts: the expected value of the conditional variance of $X$ given $Y$, and the variance of the conditional expectation of $X$ given $Y$.

Law of Total Expectation and Variance (Tower Property)

Let $X$ and $U$ be random variables such that $U \sim Unif(0. 1)$, and $\mathbb{E}[X\ \vert\ U] = 3U^2$. Find $\mathbb{E}[X]$.

Law of Total Expectation and Variance (Tower Property)

2 Inequalities

Markov’s Inequality

Let $X$ be a random variable with $\mathbb{P}(X \geq 0) = 1$. Then, for any $a > 0$, \[\mathbb{P}( X \geq a ) \le \frac{\mathbb{E}[ X ] }{a}.\]

Note that this implies that for any random variable $Y$,

\[\mathbb{P}(|Y| \geq a) \le \mathbb{E}[|Y|] / a.\]

Proof of Markov’s Inequality

Let $Z = a I_{[a,\infty)}(X)$. We have that $Z \leq X$ almost surely, and hence $\mathbb{E}[Z] \leq \mathbb{E}[X]$ by monotonicity of expectation. But \[\begin{aligned} \mathbb{E}[X] &\geq \mathbb{E}[Z]\\ &= a \mathbb{P}(Z = a) + 0 \mathbb{P}(Z = 0)\\ & = a \mathbb{P}(Z = a)\\ &= a\mathbb{P}(X \geq a). \end{aligned}\]

Chebyshev’s¹ Inequality

Let $X$ be a random variable with finite mean $\mu$.

Then, for any $a > 0$, \[\mathbb{P}(| X - \mu| \geq a ) \leq \ \frac{\operatorname{Var}(X)}{a^2}.\]

Chebyshev’s¹ Inequality

We have \[\begin{aligned} \mathbb{P}( |X - \mu| \geq a) &=\mathbb{P}( (X - \mu) ^2 \geq a^2) \\ &\leq \frac{\mathbb{E}[(X - \mu) ^2 ] }{a^{2}} & \textrm{(by Markov's ineq.)}\\ &=\frac{\operatorname{Var}(X)}{a^{2}}. \end{aligned}\]

Comparing Markov and Chebyshev

Let $X$ be a non-negative random variable with mean $\mu$ and variance $\mu$.

We want to examine the bounds on $\mathbb{P}((X - \mu)/\mu \geq 1)$ given by Markov’s and Chebyshev’s inequalities.

Markov’s inequality gives \[\mathbb{P}((X - \mu)/\mu \geq 1) = \mathbb{P}( X \geq 2\mu ) \leq \ \frac{\mathbb{E}[X]}{2\mu} = \frac{\mu}{2\mu} = \frac{1}{2}.\]

Chebyshev’s inequality gives \[\mathbb{P}((X - \mu)/\mu \geq 1) = \mathbb{P}(X - \mu \geq \mu) \leq \mathbb{P}(|X - \mu| \geq \mu) \leq \frac{\operatorname{Var}(X)}{\mu^2} = \frac{\mu}{\mu^2} = \frac{1}{\mu}.\]

So for any random variable with mean $\mu$ and variance $\mu$, Chebyshev’s inequality gives a tighter bound whenever $\mu > 2$.

Binomial Bounds

Suppose you flip a fair coin 100 times. Use Markov’s and Chebyshev’s inequalities to approximate the probability of seeing 60 or more heads.

Binomial Bounds

Far Away Stars

Suppose that a radio telescope can measure the distance to a star.
But due to atmospheric conditions, instrumental error, and movements of the earth, each measurement is a random variable with mean $\mu$ light years (the true distance) and variance $4$ (square) light years.
An astronomer plans to take $n$ independent measurements of the distance and use their average $\overline{X}_n$ as an estimate for the true distance.

How many measurements should the astronomer make if they want the probability of a mismeasurement larger than 1 light year to be no more than 0.01?

Hint: recall that $\mathbb{E}[ \overline{X}_n ] = \mathbb{E}[X_1]$ and $\operatorname{Var}( \overline{X}_n ) = \operatorname{Var}(X_1)/n$.

Far Away Stars

Cauchy Schwarz Inequality

Cauchy Schwarz for random variables

Let $X$ and $Y$ be two random variables with finite second moments. Then, \[|\mathbb{E}[XY]| \leq \sqrt{\mathbb{E}[X^2] \,\mathbb{E}[Y^2]}.\]

Let $X$ and $Y$ be two random variables with finite second moments. Then, \[|\operatorname{Cov}(X, Y)| \leq \sqrt{\operatorname{Var}(X) \, \operatorname{Var}(Y)}.\]

If, in addition, $\operatorname{Var}(X) > 0$ and $\operatorname{Var}(Y) > 0$, then \[|\operatorname{Corr}(X, Y)| \leq 1.\]

Jensen’s Inequality

Recall that a function $f: {\mathbb{R}}\to {\mathbb{R}}$ is convex if for any $x, y \in {\mathbb{R}}$ and $\lambda \in [0, 1]$, we have \[f(\lambda x + (1 - \lambda) y) \leq \lambda f(x) + (1 - \lambda) f(y).\]

Let $X$ be a random variable with finite mean and let $f: {\mathbb{R}}\to {\mathbb{R}}$ be a convex function. Then, \[f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)].\]

Questionable Friendships

Your friend offers to play the following game with you.

Your friend pays you $49 to roll 2 standard 6-sided dice.
If you see $x$ pips, you pay your friend $$x^2$.
Repeat as many times as you like, and your friend will keep paying you $49 each time.

How many times should you play this game? Justify your answer.

Questionable Friendships

Followup on Jensen’s Inequality

Show that the variance of a random variable is always non-negative.

Followup on Jensen’s Inequality

To Do

Work on Assignment 3, due TONIGHT June 10, 11:59pm on Gradescope.
Read Chapter 4.2 - 4.3 before next class.

Lecture 14

Grace Tompkins

Learning Outcomes

1 Conditional Expectation

Conditional Expectation

Conditional Expectation

Conditional Expectation

Conditional Variance

Conditional Variance

Conditional Variance

Conditional Expectation and Variance

Conditional Expectation and Variance

Conditional Expectation and Variance

Hierarchical Models

Hierarchical Models

Law of Total Expectation

Law of Total Expectation and Variance (Tower Property)

Law of Total Expectation and Variance (Tower Property)

Law of Total Expectation and Variance (Tower Property)

2 Inequalities

Markov’s Inequality

Proof of Markov’s Inequality

Chebyshev’s1 Inequality

Chebyshev’s1 Inequality

Comparing Markov and Chebyshev

Binomial Bounds

Binomial Bounds

Far Away Stars

Far Away Stars

Cauchy Schwarz Inequality

Jensen’s Inequality

Questionable Friendships

Questionable Friendships

Followup on Jensen’s Inequality

Followup on Jensen’s Inequality

To Do

Chebyshev’s¹ Inequality

Chebyshev’s¹ Inequality