Convergence of random variables
Last modified — 05 Apr 2026
In probability, we look at the limit of a sequence of random variables, \(X_n\), as \(n\) goes to infinity.
This turns out to be more complicated, because there are different modes of convergence.
We will discuss 3 types of convergence.
Let \(U \sim {\mathrm{Unif}}(0,1)\) and define
\[X_n = U + B_n,\]
where \(B_n\sim {\mathrm{Bern}}(1/n)\) are independent Bernoulli random variables, also independent of \(U\).
Then \(X_n\overset{p}{\to}U\).
\[\mathbb{P}(|X_n - U| > \epsilon) = \mathbb{P}\left(|B_n| > \epsilon\right) = \mathbb{P}(B_n = 1) = 1/n \to 0.\]
Show that \(Y_n \overset{p}{\to}1\).
Hint: \(|Y_n - 1| > \epsilon\) if and only if \(Y_n < 1 - \epsilon\).
Let \(U \sim {\mathrm{Unif}}(0,1)\) and define
\[X_n = U + B_n\]
where \(B_n\sim {\mathrm{Bern}}(1/n)\) are independent Bernoulli random variables, also independent of \(U\).
Then \(X_n\) does NOT converge almost surely to \(U\). The proof is similar to Question 6 on the first exam.
\[\begin{aligned} \mathbb{P}(\{\omega : \lim_{n \to \infty} X_n(\omega) = U(\omega)\}) &= \mathbb{P}\left(\{\omega : \lim_{n \to \infty} B_n(\omega) = 0\}\right)\\ &= \mathbb{P}(\left\{\omega : \text{there exists } m \text{ such that, for all } n \geq m, B_n(\omega) = 0\right\})\\ &= \mathbb{P}\left(\bigcup_{m=1}^\infty \bigcap_{n=m}^\infty \{\omega: B_n(\omega) = 0\}\right) \\ &= \lim_{m \to \infty} \mathbb{P}\left(\bigcap_{n=m}^\infty \{B_n = 0\}\right)\\ &= \lim_{m \to \infty} \prod_{n=m}^\infty \mathbb{P}(B_n = 0) = \lim_{m \to \infty} \prod_{n=m}^\infty (1 - 1/n) = 0 \neq 1. \end{aligned}\]
Let \(U \sim {\mathrm{Unif}}(0,1)\), and let \(U_n\sim {\mathrm{Unif}}(0,1)\) all independent. Define
\[X_n = U_n + B_n\]
where \(B_n\sim {\mathrm{Bern}}(1/n)\) are independent Bernoullis, also independent of \(U, U_1, U_2, \ldots\).
Then \(X_n\overset{d}{\to}U\) but \(X_n\) does NOT converge in probability to \(U\).
We have that, for all \(t\), \[m_{X_n}(t) = m_{U_n}(t) m_{B_n}(t) = \frac{e^t - 1}{t} (1-\frac{1}{n} + \frac{1}{n}e^t) \to \frac{e^t - 1}{t} = m_U(t).\]
However, \[\begin{aligned} \mathbb{P}(|X_n - U| > \epsilon) &= \mathbb{P}\left(|U_n + B_n - U| > \epsilon\right) \\ &= (1-1/n)\mathbb{P}\left(|U_n - U| > \epsilon\right) + (1/n)\mathbb{P}\left(|U_n +1- U| > \epsilon\right) \\ &= (1-1/n)(1-\epsilon)^2 + (1/n)a \quad\quad\text{for some $a\in[0,1]$}\\ &\rightarrow (1-\epsilon)^2 \neq 0. \end{aligned}\]
Hint: Recall that the MGF of \(\mathcal{N}(\mu, \sigma^2)\) is \(m(t) = e^{\mu t + \sigma^2 t^2/2}\).
Interpreting convergence
Let \(U_1, U_2, \ldots\) be i.i.d. \({\mathrm{Unif}}(0,1)\) random variables. Define \(Y_n = \max\{U_1, \ldots, U_n\}\).
Show that \(n(1-Y_n) \overset{d}{\to}{\mathrm{Exp}}(1)\).
Hints:
\[\overline{X}_n = \frac{1}{n} \, \sum_{i=1}^n X_i \overset{p}{\to}\mu.\]
Interpretation
The distribution of \(\overline{X}_n\) gets more and more concentrated around \(\mu\) as \(n\) increases.
That is \[X_n = \sum_{i=1}^n X_{n,i}^2,\] where \(X_{n,i}\) is the result of the \(i\)-th die roll.
Show that \(n^{-1}X_n \overset{p}{\to}m\) for some \(m\) (find \(m\) explicitly).
Then, by Chebyshev’s inequality, for all \(\epsilon > 0\), \[\begin{aligned} \mathbb{P}\left( \left\vert \overline{X}_n-\mu \right\vert \geq \epsilon \right) &\leq \frac{\operatorname{Var}(\overline{X}_n)}{\epsilon^2} = \frac{\sigma^2/n}{\epsilon^2} \to 0. \end{aligned}\]
Then, \[ \frac{\sqrt{n}(\overline{X}_n-\mu )}{\sigma } \overset{d}{\to}\mathcal{N}\left( 0,1\right). \]
Interpretation
Probability statements about \(\overline{X}_n\) can be approximated using a Normal distribution. It’s the probability statements that we are approximating, not the random variable itself.
Warning
What happens if we don’t normalize?
The WLLN tells us that \(\overline{X}_n \overset{p}{\to}\mu\).
Defining \(Y_n = (X_n - \mu)/\sigma\), we have \(\overline{Y}_n = (\overline{X}_n - \mu)/\sigma\).
But, we also know that \(\mathbb{E}[Y_n] = 0\) and \(\operatorname{Var}(Y_n) = 1\) for all \(n\).
So the LLN tells us that \(\overline{Y}_n \overset{p}{\to}0\).
Interpretation
Multiplying by \(\sqrt{n}\) prevents the distribution of \(\overline{Y}_n\) from collapsing to a point mass at \(0\) as \(n\) increases, and allows us to get a non-degenerate limit distribution.
Define \[Z_n = \frac{\sqrt{n}(\overline{X}_n-\mu )}{\sigma }.\]
You should not say things like:
\[\overline{X}_n \overset{d}{\to}\mathcal{N}(\mu, \sigma^2/n).\]
In many situations, the exact distribution of \(\overline{X}_n\), \(\mathbb{P}(\overline X_n \leq x)\), is hard to determine exactly.
The CLT allows us to approximate this value by
\[\mathbb{P}(\overline X_n \leq x) \approx \Phi\Big ( \frac{\sqrt{n}(x-\mu)}{\sigma}\Big )\] with a respectable precision when \(n\) is large.
Chebyshev’s inequality suggests \[n \ge 400\] independent observations.
CLT suggests \[n \ge 27\] independent observations
Both are correct, but the CLT is more precise.
To be fair, it used more information (the asymptotic distribution of the sample mean), which may or may not be accurate.
Chebyshev’s doesn’t use any approximation, it’s a guarantee.
Let \(Z_n = \frac{\sqrt{n}(\overline{X}_n-\mu )}{\sigma}\) as before.
Define \(Y_i = (X_i - \mu)/\sigma\) for all \(i\).
Suppose that \(Y_i\) has moment generating function \(m_Y(t)\).
Therefore, the moment generating function of \(Z_n\) is \(m_Y(t/\sqrt{n})^n\).
Now, \(m'_Y(0) = \mathbb{E}[Y_i] = 0\) and \(m''_Y(0) = \mathbb{E}[Y_i^2] = 1\).
By Taylor’s theorem, for all \(t\), \[\begin{aligned} m_Y(t) &= m_Y(0) + m'_Y(0)t + \frac{1}{2}m''_Y(0)t^2 + \cdots\\ &= 1 + 0 + \frac{t^2}{2} + \frac{t^3}{3!}m'''_Y(0) + \cdots \\ &= 1 + \frac{t^2}{2} + \frac{t^3}{3!}m'''_Y(0)+ \cdots.\\ \Longrightarrow \quad m_{Z_n}(t) &= m_Y(t/\sqrt{n})^n \ = \left(1 + \frac{\frac{t^2}{2} + \frac{t^3}{3!n^{1/2}}m'''_Y(0) + \cdots}{n}\right)^n \to e^{t^2/2} = m_Z(t). \end{aligned}\]
[We used the fact that \(\lim_{n \to \infty} (1 + a_n/n)^n = e^a\) when \(a_n \to a\).]
The CLT states that when \(n\) is large, the distribution of \[\frac{\overline{X}_n - \mu }{\sigma /\sqrt{n}} \text{ \ \ is \ approximately \ } \mathcal{N}\left( 0,1\right).\]
This implies that when \(n\) is large, we can also say something about the distribution of \(S_n = \sum_{i=1}^{n}X_{i}\).
\[\begin{aligned} 1-\Phi(z) &= \lim_{n\rightarrow\infty}\mathbb{P}\left( \frac{\overline X_n - \mu} {\sigma/\sqrt{n}} > z \right)\\ &=\lim_{n\rightarrow\infty}\mathbb{P}\left( \frac{(n\overline X_n - n\mu)} {n\sigma/\sqrt{n}} > z \right)\\ &=\lim_{n\rightarrow\infty}\mathbb{P}\left( \frac{S_n - n\mu} {\sqrt{n}\sigma} > z \right) \end{aligned}\]
The daily sales on any given day of a restaurant is a random variable with mean $2500 and SD $500.
Assume that daily sales are independent random variables.
Leave your answer in terms of \(\Phi\) (the CDF of a standard Gaussian).
The field of Statistics turns the operation around.
Summary
Stat302 gives (gave?) you the tools to understand the behavior of random variables — the foundation for understanding statistical inference.
Stat 302 - Winter 2025/26