Module 08

Expected values, Variance, Correlation, and Generating functions


TC and DJM

Last modified — 16 Mar 2026

1 Expected values

Expected value of random variables

Definition
The expected value of a random variable \(g(X)\) is defined by \[\mathbb{E}[g(X)] = \begin{cases} \displaystyle\sum_{x} g(x) p_X(x) & \text{if $X$ is discrete}\\ \\ \displaystyle\int_{-\infty}^\infty g(x) f_X(x) \mathsf{d}x & \text{if $X$ is absolutely continuous} \end{cases}\] provided that the sum or integral exists.

  • The sum in the discrete case is over all \(x\) such that \(p_X(x) > 0\) (countable).
  • The sum/integral exists when \(\mathbb{E}[|g(X)|] < \infty\). Otherwise, we say that \(\mathbb{E}[g(X)]\) does not exist.
  • Note that you do not need to know the distribution/PMF/PDF/CDF of \(g(X)\) to compute \(\mathbb{E}[g(X)]\), only the distribution of \(X\).

Heuristic for expected value

Given a random variable \(X\),

\(\mathbb{E}[g(X)]\) is the weighted average value of \(g(X)\)

where the weights are given by the probabilities of \(X\) taking on different values.

In primary school, you learned about sample averages of \(n\) data points.

If we were to construct a discrete random variable \(Y\) by giving each observed value the probability of \(1/n\),

then \(\mathbb{E}[Y] = \frac{1}{n}\sum_{i=1}^n y_i\) would be the sample average of the observed values.

Simplified calculation for Binomial

Let \(X \sim {\mathrm{Binom}}(n, \theta)\). We want to compute \(\mathbb{E}[X]\).

  • Your book (Example 3.1.7) gives a complicated calculation. But we can use “kernel matching”.

We have that \[\begin{aligned} \mathbb{E}[X] &= \sum_{x=0}^n x \binom{n}{x} \theta^x (1-\theta)^{n-x} = \sum_{x=1}^n x \binom{n}{x} \theta^x (1-\theta)^{n-x} && \text{sum is 0 when } x=0\\ &= \sum_{x=1}^n n \binom{n-1}{x-1} \theta^x (1-\theta)^{n-x} && \text{because } x\binom{n}{x} = n\binom{n-1}{x-1}\\ &= n \sum_{y=0}^{n-1} \binom{n-1}{y} \theta^{y+1} (1-\theta)^{(n-1)-y} && \text{substitute } x = y + 1 \\ &= n\theta \sum_{y=0}^{n-1} \binom{n-1}{y} \theta^y (1-\theta)^{(n-1)-y} && \text{compare to } {\mathrm{Binom}}(n-1, \theta)\\ &= n\theta. \end{aligned}\]

Gamma expectation

Let \(X \sim \textrm{Gamma}(\alpha, \lambda )\). We want to compute \(\mathbb{E}[X]\).

We have that \[\begin{aligned} \mathbb{E}[X] &= \int_0^\infty x \frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\lambda x} \mathsf{d}x \\ &= \frac{\lambda^\alpha}{\Gamma(\alpha)} \int_0^\infty x^{\alpha} e^{-\lambda x} \mathsf{d}x \\ &= \frac{\lambda^\alpha}{\Gamma(\alpha)} \cdot \frac{\Gamma(\alpha + 1)}{\lambda^{\alpha + 1}} \int_0^\infty \frac{\lambda^{\alpha+1}}{\Gamma(\alpha+1)} x^{\alpha + 1- 1} e^{-\lambda x} \mathsf{d}x \\ &= \frac{\Gamma(\alpha + 1)}{\lambda \Gamma(\alpha)} && \text{integrates to 1 because Gamma}(\alpha + 1, \lambda)\\ &= \frac{\alpha \Gamma(\alpha)}{\lambda \Gamma(\alpha)} && \text{because } \Gamma(\alpha + 1) = \alpha \Gamma(\alpha) \\ &= \frac{\alpha}{\lambda}. \end{aligned}\]

More Gamma expectations

Let \(X \sim {\mathrm{Gam}}(\alpha, \lambda)\) where \(\alpha>0\) and \(\lambda > 0\). Recall that the PDF of a RV \(Y\sim{\mathrm{Gam}}(\theta, \beta)\) is given by \[f_X(x) = \frac{\beta^\theta}{\Gamma(\theta)} x^{\theta - 1} e^{-\beta x} I_{(0,\infty)}(x).\]

Exercise 1
Let \(t < \lambda\). Find \(\mathbb{E}[\exp(tX)]\).

Important properties

(Where the expected value exists.)

Linearity
for any \(a, b, c \in {\mathbb{R}}\), any functions \(g\) and \(h\), and any random variables \(X\) and \(Y\). \[\mathbb{E}[a g(X) + b h(Y) + c] = a \mathbb{E}[g(X)] + b \mathbb{E}[h(Y)] + c\]
Boundedness
If \(a< g(x) < b\) for all \(x\) in the support of \(X\), then \(a < \mathbb{E}[g(X)] < b.\)
Monotonicity
If \(g(x) \le h(x)\) for all \(x\) in the support of \(X\), then \(\mathbb{E}[g(X)] \le \mathbb{E}[h(X)].\)
Independence
If \(X\) and \(Y\) are independent, then \[\mathbb{E}[g(X) h(Y)] = \mathbb{E}[g(X)] \mathbb{E}[h(Y)].\]

For the last property, the converse is false.

Expectation of a function of two random variables

Exercise 2
Let \(X\sim U(0,\theta)\) and \(Y\sim{\mathrm{Exp}}(1)\) be independent.

Find \(\mathbb{E}\left[\frac{1}{2}(X + Y)^2\right]\).

Scalar-valued functions of multiple random variables

Theorem
Let \(g : {\mathbb{R}}^2 \to {\mathbb{R}}\) be a function.

If \(X\) and \(Y\) are both discrete random variables, then \[\begin{aligned} \mathbb{E}[g(X, Y)] &= \sum_{x} \sum_{y} g(x, y) p_{X,Y}(x, y). \end{aligned}\] If \(X\) and \(Y\) are jointly absolutely continuous random variables, then \[\begin{aligned} \mathbb{E}[g(X, Y)] &= \int_{-\infty}^\infty \int_{-\infty}^\infty g(x, y) f_{X,Y}(x, y) \mathsf{d}x \mathsf{d}y. \end{aligned}\]

Product of expectations

  • If \(X\) and \(Y\) are independent, then \(\mathbb{E}[g(X)h(Y)] = \mathbb{E}[g(X)]\mathbb{E}[h(Y)]\).

Proof
Suppose that \(X\) and \(Y\) are jointly absolutely continuous random variables with joint PDF \(f_{X,Y}(x, y)\). Note that \(\mathbb{E}[g(X)h(Y)]\) is a scalar-valued function of \(X\) and \(Y\).

\[\begin{aligned} \mathbb{E}[g(X)h(Y)] &= \int_{-\infty}^\infty \int_{-\infty} ^\infty g(x) h(y) f_{X,Y}(x, y) \mathsf{d}x \mathsf{d}y \\ &= \int_{-\infty}^\infty \int_{-\infty}^\infty g(x) h(y) f_X(x) f_Y(y) \mathsf{d}x\mathsf{d}y && \text{$X$ and $Y$ are independent} \\ &= \int_{-\infty}^\infty g(x) f_X(x) \mathsf{d}x \int_{-\infty}^\infty h(y) f_Y(y) \mathsf{d}y \\ &= \mathbb{E}[g(X)] \mathbb{E}[h(Y)]. \end{aligned}\]

Computing expectations using the joint distribution

Let \(X\) and \(Y\) have joint PDF \[f_{X,Y}(x,y) = 8xyI_{\{0 < x < y < 1\}}(x,y).\]

Exercise 3
Find \(\mathbb{E}[X]\), \(\mathbb{E}[Y]\), and \(\mathbb{E}[XY]\).

2 Variance, covariance, and correlation

Variance

Definition
The variance of a random variable \(X\) is defined by \[\sigma^2_X = \operatorname{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2].\]

  • Careful: \(\mathbb{E}[X]\) is a number, not a random variable.
  • The variance is a measure of the spread of the distribution of \(X\) around its mean \(\mathbb{E}[X]\).
  • Note that \(g(X) = (X - \mathbb{E}[X])^2\) is a function of \(X\), so we can compute \(\operatorname{Var}(X)\) using the definition of expected value.
  • The “units” of \(\operatorname{Var}(X)\) are the square of the units of \(X\).

Definition
The standard deviation of a random variable \(X\) is defined by \(\sigma_X = \sqrt{\operatorname{Var}(X)}.\)

Properties of variance

(Where the variance exists.)

Scaling
For any \(a \in {\mathbb{R}}\), \(\operatorname{Var}(aX) = a^2 \operatorname{Var}(X)\).
Shift invariance
For any \(a \in {\mathbb{R}}\), \(\operatorname{Var}(X + a) = \operatorname{Var}(X)\).
Non-negativity
\(\operatorname{Var}(X) \ge 0\).

Tip

\[\begin{aligned} \operatorname{Var}(X) &= \mathbb{E}[(X - \mathbb{E}[X])^2] \\ &= \mathbb{E}[X^2 - 2X\mathbb{E}[X] + \mathbb{E}[X]^2] \\ &= \mathbb{E}[X^2] - 2\mathbb{E}[X]\mathbb{E}[X] + \mathbb{E}[X]^2 \\ &= \mathbb{E}[X^2] - \mathbb{E}[X]^2.\\ \Longrightarrow \operatorname{Var}(X) &\leq \mathbb{E}[X^2]. \end{aligned}\]

Exponential variance

Exercise 4
Let \(X \sim {\mathrm{Exp}}(\lambda)\). Find \(\operatorname{Var}(X)\).

Hints: remember that \(\mathbb{E}[X] = 1/\lambda\) and that \(\Gamma(z) = (z-1)!\) for integer \(z > 1\).

3 Covariance and correlation

Covariance

If we have two random variables \(X\) and \(Y\), we can measure the relationship between them.

Definition
The covariance between two random variables \(X\) and \(Y\) is defined by \[\operatorname{Cov}(X, Y) = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])].\]

  • \(\operatorname{Cov}(X, Y)\) is a scalar-valued function of \(X\) and \(Y\).
  • The covariance measures the linear relationship between \(X\) and \(Y\).
  • If \(\operatorname{Cov}(X, Y) > 0\), then \(X\) and \(Y\) tend to increase together.

We can compute the covariance using the joint PMF/PDF of \(X\) and \(Y\).: \[\begin{aligned} \operatorname{Cov}(X, Y) &= \sum_{x} \sum_{y} (x - \mathbb{E}[X])(y - \mathbb{E}[Y]) p_{X,Y}(x, y).\\ \operatorname{Cov}(X, Y) &= \int_{-\infty}^\infty \int_{-\infty}^\infty (x - \mathbb{E}[X])(y - \mathbb{E}[Y]) f_{X,Y}(x, y) \, \mathsf{d}x \, \mathsf{d}y. \end{aligned}\]

Properties of covariance

Linearity
For any \(a, b, c \in {\mathbb{R}}\), and random variables \(X\), \(Y\), and \(Z\), \[\begin{aligned} \operatorname{Cov}(a X + b Y, c Z ) &= ac \operatorname{Cov}(X, Z) + bc \operatorname{Cov}(Y, Z). \end{aligned}\]
Easier calculation
\[\operatorname{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y].\]
Independence
If \(X\) and \(Y\) are independent, then \(\operatorname{Cov}(X, Y) = 0\). The converse is false.

Heads or tails?

Exercise 5
Let \(X\) be the number of heads in 5 tosses of a fair coin, and let \(Y\) be the number of tails in the same 5 tosses. Find \(\operatorname{Cov}(X, Y)\).

Hint: If \(Z\sim {\mathrm{Binom}}(n, \theta)\), then \(\operatorname{Var}(Z) = n\theta(1-\theta)\).

Variance, covariance, and sums

Let \(X\) and \(Y\) be random variables with finite variances.

  • \[\operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y) + 2\operatorname{Cov}(X, Y).\]
  • If \(X\) and \(Y\) are independent, then \[\operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y).\]
  • More generally, if \(X_1, \ldots, X_n\) are independent random variables with finite variances, then \[\operatorname{Var}\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n \operatorname{Var}(X_i).\]

Correlation

Covariance is not a standardized measure of the relationship between \(X\) and \(Y\).

For example, if we multiply \(X\) by 100, then \(\operatorname{Cov}(X, Y)\) will also be multiplied by 100.

Definition
The correlation between two random variables \(X\) and \(Y\) is defined by \[\rho_{XY} = \operatorname{Corr}(X, Y) = \frac{\operatorname{Cov}(X, Y)}{\sigma_X \sigma_Y}.\]

  • The correlation is a standardized measure of the linear relationship between \(X\) and \(Y\).
  • We’ll see later that \(-1 \le \rho_{XY} \le 1\).

Returning to the heads or tails example, we have that \[\begin{aligned} \rho_{XY} &= \frac{\operatorname{Cov}(X, Y)}{\sigma_X \sigma_Y} = \frac{-5/4}{\sqrt{5/4} \sqrt{5/4}} = -1. \end{aligned}\]

End of material for Midterm 2.