Discrete Random Variables
Last modified — 23 May 2026
By the end of this lecture, students are anticipated to be able to:
A random variable is a function that maps from the sample space to \({\mathbb{R}}\).
Typically denoted by \(X\), \(Y\), \(Z\) (letters near the end of the alphabet)
The distribution of a random variable is the collection of probabilities associated with every subset of \({\mathbb{R}}\) (we ignore details).
Why is this useful?
Random variables will allow us to quit worrying about the sample space, and simply deal with the abstraction.
Any properties we learn about the distribution of the random variable apply to any sample space that could possibly be “modelled” with that Random Variable.
We can leave the world of dice and coins and just talk about math.
We start with Discrete random variables.
We say that a random variable \(X\) is discrete if there exists a countable set \(K = \{x_1,x_2,\dots\}\) such that
\[\mathbb{P}(X \in K) = 1.\]
I can count the \(x\) that have positive probability.
Toss a fair coin 3 times, count the total number of Heads.
Roll a 6-sided die until you see 6, count the number of rolls.
Count the number of people that arrive at the bus stop in some amount of time.
Will it be rainy (\(-6\)), cloudy (2), or sunny (\(+10\)) on campus today?
For a discrete random variable, its probability (mass) function is the function \(p_X : {\mathbb{R}}\rightarrow [0,1]\) defined by \[p_X(x) = \mathbb{P}(X = x).\]
It is occasionally written \(f_X(x) = \mathbb{P}(X = x).\)
Suppose it rains 30% of the time, is cloudy but not raining 30% of the time, and sunny 40% of the time.
Your happiness is given by the random variable \(Z\) with \(Z=-6, 2,\) and 10 respectively.
What is the support of \(Z\)?
What is the PMF of \(Z\)?
Toss a fair coin 3 times. Let \(X\) be the number of heads.
The PMF can be used to find the probabilities of events.
We have that for some event \(A\) and a RV \(X\),
\[\mathbb{P}(X \in A) = \sum_{x \in A} p_X(x).\]
Recall: it rains 30% of the time, is cloudy but not raining 30% of the time, and sunny 40% of the time. Your happiness is given by the random variable \(Z\) with \(Z=-6, 2,\) and 10 respectively.
For a fixed number \(\theta \in [0, 1]\), define a discrete random variable \(X\) with PMF given by:
\[ p_X\left( x;\ \theta\right) =\begin{cases} \left( 1-\theta\right) ^{2} & x=0, \\ 2\theta\left( 1-\theta\right) & x=1, \\ \theta^{2} & x=2, \\ 0 & \mbox{else.} \end{cases} \]
\[ p_X\left( x;\ \theta = 0.1\right) =\begin{cases} 0.81 & x=0, \\ 0.18 & x=1, \\ 0.01 & x=2 \\ 0 & \mbox{else.} \end{cases} \]
By including a parameter \(\theta\), we are able to express many distributions (a family of distributions) with a single functional form.
An experiment with 2 outcomes (success/failure).
Suppose the probability of success is \(\theta \in (0,1)\). A RV \(X\) with PMF given by
\[p_X(x; \theta) = \theta^x(1-\theta)^{1-x}I_{\{0,1\}}(x),\]
is said to have the \({\mathrm{Bern}}(\theta)\) distribution.
If \(\theta = 0.5\), then we have our favourite fair-coin friend.
But we can let \(\theta\) be numbers other than 0.5.
Used to determine the number of “successes” (\(x\)) in a sequence of \(n\) independent Bernoulli trials, each with the same probability of success (\(\theta\)).
Suppose the probability of success is \(\theta \in (0,1)\). A RV \(X\) with PMF given by
\[p_X(x; n, \theta) = \binom{n}{x} \theta^x (1-\theta)^{n-x}I_{\{0,1,\dots,n\}}(x),\]
is said to have the \({\mathrm{Binom}}(n, \theta)\) distribution.
\[p_X(x; n, \theta) = \binom{n}{x} \theta^x (1-\theta)^{n-x}I_{\{0,1,\dots,n\}}(x)\]
Let \(\lambda > 0\). A RV \(Y\) with PMF given by
\[p_Y(y; \lambda) = \frac{\lambda^y}{y!}e^{-\lambda} I_{\{0,1,2,\dots\}}(y),\]
is said to have the \({\mathrm{Poiss}}(\lambda)\) distribution.
Sometimes the PMF is written as: \[p_Y(y; \lambda t) = \frac{(\lambda t)^y}{y!}e^{-(\lambda t)} I_{\{0,1,2,\dots\}}(y),\] where \(\lambda\) is “per unit time” and \(t\) is the number of seconds/minutes/days/etc.
A coffee shop receives an average of 3 customers per minute during the morning rush. What is the probability that exactly 5 customers arrive in a given minute?
Here, \(\lambda\) is the average number of customers per minute. If we let \(X\) be the number of customers in a given minute, then \(X \sim {\mathrm{Poiss}}(3)\). Using the formula:
\[ p_X(x; \lambda) = \frac{\lambda^x}{x!}e^{-\lambda} I_{\{0,1,2,\dots\}}(x) \]
\[ p_X(x = 5) = \frac{3^5}{5!}e^{-3} = 0.1008 \]
Suppose that \(Y\sim {\mathrm{Poiss}}(\lambda)\), and you observe \(Y=5\). What value of \(\lambda\) makes \(\mathbb{P}(Y=5)\) as large as possible?
\[p_Y(y; \lambda) = \frac{\lambda^y}{y!}e^{-\lambda} I_{\{0,1,2,\dots\}}(y)\]
Hint: Take the \(\log\)!
Suppose the probability of success is \(\theta \in (0, 1)\). A RV \(X\) with PMF given by
\[ p_X(x; \theta) = (1 - \theta)^x\theta I_{\{0,1,2,\dots\}}(x) \] is said to have a \({\mathrm{Geom}}(\theta)\) distribution.
What is the probability of flipping 3 coins before seeing the first tails?
We can let \(X\) be the number of heads (failures) before the first tail (success). Then:
\[ p_X(x = 3; 0.5) = (1 - 0.5)^3(0.5) = 0.0625 \]
Suppose again the probability of success is \(\theta \in (0,1).\) The random variable \(Y\) with PMF given by
\[ p_Y(y; \theta, r) = {{r - 1 + y}\choose {y}}\theta^{r}(1 - \theta)^yI_{\{0,1,2,\dots\}}(y) \]
is said to have a \({\mathrm{NegBinom}}(r, \theta)\) distribution.
This distribution is useful in determining the probability that \(y\) failures appear before the \(r\)th success in a repeated experiment, like a coin flip.
This is a generalization of the geometric distribution.
What is the probability of flipping 3 heads before seeing the 2nd tails?
Suppose we have a finite population of size \(N\) containing \(K\) successes, and a sample size of \(n\). A random variable \(X\) with PMF
\[ p_X(x; N, K, n) = \frac{{K\choose x}{{N - K}\choose{n-x}}}{{N \choose n}}I_{\{0,1,2,\dots\}}(x) \]
A shipment of 50 laptops arrives at a warehouse, 15 of which are defective. A quality control inspector randomly selects 10 laptops to inspect. What is the probability that exactly 3 of the selected laptops are defective?
\({\mathrm{Categorical}}(\{p_1,\dots,p_K\})\) — Generalization of \({\mathrm{Bern}}(\theta)\) to \(K\) outcomes instead of 2.
\({\mathrm{Multinom}}(n, \{p_1,\dots,p_K\})\) — Generalization of \({\mathrm{Binom}}(n, \theta)\) to \(K\) outcomes instead of 2.
For each of these (and many others), we don’t need to “count” stuff. We just ask the math formula (or software) to calculate the probabilities of outcomes.
The trick is to translate the word problem to the correct named distribution.
For each of the following experiments, name the distribution that would make an appropriate model, and if obvious, identify the value(s) of any parameters.
Stat 302 - Winter 2025/26