Families of random variables
Last modified — 04 Feb 2026
A random variable is a function that maps from the sample space to \({\mathbb{R}}\).
Typically denoted by \(X\), \(Y\), \(Z\) (letters near the end of the alphabet)
The distribution of a random variable is the collection of probabilities associated with every subset of \({\mathbb{R}}\) (we ignore details).
Why is this useful?
Random variables will allow us to quit worrying about the sample space, and simply deal with the abstraction.
Any properties we learn about the distribution of the random variable apply to any sample space that could possibly be “modelled” with that Random Variable.
We can leave the world of dice and coins and just talk about math.
We start with Discrete random variables.
\[\mathbb{P}(X \in K) = 1.\]
I can count the \(x\) that have positive probability.
Toss a fair coin 3 times, count the total number of Heads.
Roll a 6-sided die until you see 6, count the number of rolls.
Count the number of people that arrive at the bus stop in some amount of time.
Will it be rainy (\(-6\)), cloudy (2), or sunny (\(+10\)) on campus today?
It is occasionally written \(f_X(x) = \mathbb{P}(X = x).\)
Suppose it rains 30% of the time, is cloudy but not raining 30%, and sunny 40% of the time.
Your happiness is given by the random variable \(Z\) with \(Z=-6, 2,\) and 10 respectively.
What is the support of \(Z\)?
What is the PMF of \(Z\)?
Toss a fair coin 3 times. Let \(X\) be the number of heads.
The PMF can be used to find the probabilities of events.
We have that for some event \(A\) and a RV \(X\),
\[\mathbb{P}(X \in A) = \sum_{x \in A} p_X(x).\]
For a fixed number \(\theta \in [0, 1]\), define a discrete random variable \(X\) with PMF given by:
\[ p_X\left( x;\ \theta\right) =\begin{cases} \left( 1-\theta\right) ^{2} & x=0, \\ 2\theta\left( 1-\theta\right) & x=1, \\ \theta^{2} & x=2, \\ 0 & \mbox{else.} \end{cases} \]
\[ p_X\left( x;\ \theta = 0.1\right) =\begin{cases} 0.81 & x=0, \\ 0.18 & x=1, \\ 0.01 & x=2 \\ 0 & \mbox{else.} \end{cases} \]
By including a parameter \(\theta\), we are able to express many distributions (a family of distributions) with a single functional form.
An experiment with 2 outcomes (success/failure).
\[p_X(x; \theta) = \theta^x(1-\theta)^{1-x}I_{\{0,1\}}(x),\]
is said to have the \({\mathrm{Bern}}(\theta)\) distribution.
If \(\theta = 0.5\), then we have our favourite fair-coin friend.
But we can let \(\theta\) be numbers other than 0.5.
A sequence of \(n\) independent Bernoulli trials, each with the same probability of success.
\[p_X(x; n, \theta) = \binom{n}{x} \theta^x (1-\theta)^{n-x}I_{\{0,1,\dots,n\}}(x),\]
is said to have the \({\mathrm{Binom}}(n, \theta)\) distribution.
\[p_X(x; n, \theta) = \binom{n}{x} \theta^x (1-\theta)^{n-x}I_{\{0,1,\dots,n\}}(x)\]
\[p_Y(y; \lambda) = \frac{\lambda^y}{y!}e^{-\lambda} I_{\{0,1,2,\dots\}}(y),\]
is said to have the \({\mathrm{Poiss}}(\lambda)\) distribution.
This is not the only distribution that is good for modelling counts, and we’ll see reasons to choose a different one later.
Sometimes written \[p_Y(y; \lambda t) = \frac{(\lambda t)^y}{y!}e^{-(\lambda t)} I_{\{0,1,2,\dots\}}(y),\] where \(\lambda\) is “per unit time” and \(t\) is the number of seconds/minutes/days/etc.
\[p_Y(y; \lambda) = \frac{\lambda^y}{y!}e^{-\lambda} I_{\{0,1,2,\dots\}}(y)\]
Hint: Take the \(\log\)!
\({\mathrm{Geom}}(\theta)\) — Flip a coin with \(\mathbb{P}(\text{success}) = \theta\) until you see the first success. How many flips?
\({\mathrm{NegBinom}}(r, \theta)\) — Flip a coin with \(\mathbb{P}(\text{success}) = \theta\) until you see \(r\) successes. How many flips?
\({\mathrm{HypGeom}}(N, M, n)\) — An urn with \(M\) red balls and \(N-M\) white balls. Draw \(n\) balls without replacement. How many are red?
\({\mathrm{Categorical}}(\{p_1,\dots,p_K\})\) — Generalization of \({\mathrm{Bern}}(\theta)\) to \(K\) outcomes instead of 2.
\({\mathrm{Multinom}}(n, \{p_1,\dots,p_K\})\) — Generalization of \({\mathrm{Binom}}(n, \theta)\) to \(K\) outcomes instead of 2.
For each of these (and many others), we don’t need to “count” stuff. We just ask the math formula (or software) to calculate the probabilities of outcomes.
The trick is to translate the word problem to the correct named distribution.
For each of the following experiments, name the distribution that would make an appropriate model, and if obvious, identify the value(s) of any parameters.
This is the most rigorous way to define continuous RVs, but it doesn’t provide much intuition.
So while, \(\mathbb{P}(X=x)=0\), it will often be the case that for an open ball, \(B(x,\delta)\), of radius \(\delta>0\), \(\mathbb{P}(X\in B(x,\delta))>0\).
We need a bit more to make our intuition match the math.
We call such a density function a probability density function (PDF) and will often use the notation \(f_X(x)\) to denote its relationship to \(X\).
Consider some set \(A \subset {\mathbb{R}}\).
Let \(X\) be a random variable.
Probability of \(A\) for discrete RV:
\[\mathbb{P}(X \in A) = \sum_{x \in A} p_X(x).\]
Probability of \(A\) for an (absolutely) continuous RV:
\[\mathbb{P}(X \in A) = \int_{A} f_X(x) \mathsf{d}x.\]
Support of discrete RV \(X\):
\[\operatorname{supp}(X) = \{x : \mathbb{P}(X=x) > 0\} \subset \{x : \sum_x \mathbb{P}(X=x) = 1\}.\]
Support of an (absolutely) continuous RV \(X\):
\[\operatorname{supp}(X) = \{x : \mathbb{P}(X \in B(x,r)) > 0,\ \forall r > 0 \} = \{x : f_X(x) > 0 \}.\]
Let \(X\) be a RV with PDF \[f_X(x) = 2 x^{-3} I_{[1,\infty)}.\]
\[\begin{aligned} \mathbb{P}(a< X < b) &= \frac{1}{R-L}\int_a^b I_{[L,R]}(x) \\ &= \frac{b-a}{R-L}, \end{aligned}\] whenever \(L\leq a<b\leq R\).

Let \(X\sim{\mathrm{Unif}}(L, R)\). What is the median of \(X\)?

\[f_X(x; \lambda) = \lambda e^{-\lambda x}I_{[0,\infty)}(x)\]
Note: \(Z\sim{\mathrm{Gam}}(1,\lambda) \Rightarrow Z\sim{\mathrm{Exp}}(\lambda)\).

\[\begin{aligned} f_Z(z; \alpha, \lambda) &= \frac{\lambda^\alpha}{\Gamma(\alpha)} z^{\alpha-1}e^{-\lambda z} I_{[0,\infty)}(z). \end{aligned}\]
\[f_Z(z; \alpha, \lambda) = \frac{\lambda^\alpha}{\Gamma(\alpha)} z^{\alpha-1}e^{-\lambda z} I_{[0,\infty)}(z)\]
We know that \[1 = \int_0^\infty \frac{\lambda^\alpha}{\Gamma(\alpha)} z^{\alpha-1}e^{-\lambda z} \mathsf{d}z \Longrightarrow \frac{\Gamma(\alpha)}{\lambda^\alpha} = \int_0^\infty z^{\alpha-1}e^{-\lambda z} \mathsf{d}z.\]
Stat 302 - Winter 2025/26