MGFs and conditional expectation
Last modified — 05 Apr 2026
Important
For now, we will skip generating functions and characteristic functions.
We saw in an Exercise earlier that if \(X \sim {\mathrm{Gam}}(\alpha, \lambda)\), then
\[\begin{aligned} m_X(t) &= \mathbb{E}[e^{tX}] = \left(1 - \frac{t}{\lambda}\right)^{-\alpha}, \quad t < \lambda. \end{aligned}\]
Then for any integer \(k \ge 1\), \[\mathbb{E}[X^k] = m_X^{(k)}(0) = \left.\frac{\mathsf{d}^k}{\mathsf{d}t^k} m_X(t)\right|_{t=0}.\]
Let \(X \sim {\mathrm{Gam}}(\alpha, \lambda)\) with MGF \(m_X(t) = \left(1 - \frac{t}{\lambda}\right)^{-\alpha}\).
\[\begin{aligned} \mathbb{E}[X] &= m_X'(0) = \frac{\alpha}{\lambda}\left(1-\frac{t}{\lambda}\right)^{-\alpha-1}\bigg|_{t=0} = \frac{\alpha}{\lambda}.\\ \mathbb{E}[X^2] &= m_X''(0) = \frac{\alpha(\alpha+1)}{\lambda^2}\left(1-\frac{t}{\lambda}\right)^{-\alpha-2}\bigg|_{t=0} = \frac{\alpha(\alpha+1)}{\lambda^2}.\\ \Longrightarrow \operatorname{Var}(X) &= \mathbb{E}[X^2] - \mathbb{E}[X]^2 = \frac{\alpha(\alpha+1)}{\lambda^2} - \left(\frac{\alpha}{\lambda}\right)^2 = \frac{\alpha}{\lambda^2}. \end{aligned}\]
\[m_X(t) = \mathbb{E}[e^{tX}] = \exp\left(\mu t + \frac{1}{2} \sigma^2 t^2\right).\]
Find \(\mathbb{E}[X]\) and \(\operatorname{Var}(X)\) using the MGF of \(X\).
Then the MGF of \(X + Y\) is given by \[\begin{aligned} m_{X+Y}(t) &= \mathbb{E}[e^{t(X + Y)}] = \mathbb{E}[e^{tX} e^{tY}] = \mathbb{E}[e^{tX}] \mathbb{E}[e^{tY}] && \text{$X$ and $Y$ are independent} \\ &= m_X(t) m_Y(t). \end{aligned}\]
This is a very important result, as it allows us to identify the distribution of a random variable by finding its MGF.
Let \(X_1, \dots, X_n\) be independent identically distributed (i.i.d.) random variables with \(X_i \sim \mathcal{N}(\mu, \sigma^2)\) for all \(i\).
Note that \(m_{X_i}(t) = \exp\left(\mu t + \frac{1}{2} \sigma^2 t^2\right)\) for \(i = 1, \ldots, n\).
Claim:
We already knew that \(\mathbb{E}[Y|\Theta=\theta] = n\theta\) and \(\operatorname{Var}(Y|\Theta=\theta) = n\theta(1-\theta)\) because \(Y|\Theta=\theta\) is a Binomial random variable with parameters \(n\) and \(\theta\).
But there are some additional properties as well.
Important
The key is that \(\mathbb{E}[X|\Theta]\) and \(\operatorname{Var}[X|\Theta]\) are themselves random variables, because they depend on \(\Theta\).
\(\mathbb{E}[X|\Theta]\) and \(\operatorname{Var}[X|\Theta]\) have their own distributions.
We refer to this general setup as a hierarchical model.
Find the distribution of \(W = \mathbb{E}[X|\Lambda]\) and \(\mathbb{E}[W]\).
Using the definition of the joint distribution of \(X\) and \(\Lambda\), we can show that \[\begin{aligned} f_{X,\Lambda}(x, \lambda) &= f_{X|\Lambda}(x|\lambda) f_{\Lambda}(\lambda) \\ &= \frac{1}{\lambda} e^{-x/\lambda} \cdot \frac{1}{\Gamma(1)} \lambda e^{-\lambda} I_{[0,\infty)}(x)I_{[0,\infty)}(\lambda) \\ &= e^{-x/\lambda}e^{-\lambda} I_{[0,\infty)}(x)I_{[0,\infty)}(\lambda). \end{aligned}\]
Using our definition of Expectation, we can find \(\mathbb{E}[X]\): \[\begin{aligned} \mathbb{E}[X] &= \int_0^\infty \int_0^\infty x e^{-x/\lambda}e^{-\lambda} I_{[0,\infty)}(x)I_{[0,\infty)}(\lambda) \mathsf{d}\lambda \mathsf{d}x = \cdots. \end{aligned}\]
That is: \[\mathbb{E}[X] = \mathbb{E}[W] = \mathbb{E}[\mathbb{E}[X|\Lambda]].\]
Stat 302 - Winter 2025/26