Lecture 6

Discrete Random Variables

Grace Tompkins

Last modified — 23 May 2026

Learning Outcomes

By the end of this lecture, students are anticipated to be able to:

Define a discrete random variable
Define and identify a probability mass function
Define, identify, and apply common families of discrete distributions

Review of Random Variables

A random variable is a function that maps from the sample space to \({\mathbb{R}}\).
Typically denoted by \(X\), \(Y\), \(Z\) (letters near the end of the alphabet)
The distribution of a random variable is the collection of probabilities associated with every subset of \({\mathbb{R}}\) (we ignore details).

Why is this useful?

Random variables will allow us to quit worrying about the sample space, and simply deal with the abstraction.
Any properties we learn about the distribution of the random variable apply to any sample space that could possibly be “modelled” with that Random Variable.
We can leave the world of dice and coins and just talk about math.

We start with Discrete random variables.

1 Discrete random variables

Discrete random variables

We say that a random variable \(X\) is discrete if there exists a countable set \(K = \{x_1,x_2,\dots\}\) such that

\[\mathbb{P}(X \in K) = 1.\]

In particular, this means that \(\{x : P(X=x)>0\}\) is countable:

I can count the \(x\) that have positive probability.

For discrete RVs, we call the set \(\{x : P(X=x) > 0\}\) the support of \(X\).
These are the types of RV’s we’ve worked with so far. We’ll visit the other extreme next class

Examples of Experiments Described by Discrete Random Variables

Toss a fair coin 3 times, count the total number of Heads.
Roll a 6-sided die until you see 6, count the number of rolls.
Count the number of people that arrive at the bus stop in some amount of time.
Will it be rainy (\(-6\)), cloudy (2), or sunny (\(+10\)) on campus today?

Probability Mass Function (PMF)

For discrete random variables, we can be explicit about the probability associated with each value in it’s support

For a discrete random variable, its probability (mass) function is the function \(p_X : {\mathbb{R}}\rightarrow [0,1]\) defined by \[p_X(x) = \mathbb{P}(X = x).\]

It is occasionally written \(f_X(x) = \mathbb{P}(X = x).\)

Each of the examples of discrete random variables has a PMF. Once we write it down, we know everything there is to know.

A Rainy Day

Suppose it rains 30% of the time, is cloudy but not raining 30% of the time, and sunny 40% of the time.

Your happiness is given by the random variable \(Z\) with \(Z=-6, 2,\) and 10 respectively.

What is the support of \(Z\)?
What is the PMF of \(Z\)?

A Rainy Day

Fair Coins

Toss a fair coin 3 times. Let \(X\) be the number of heads.

What is the support of \(X\)?
What is the \(p_X(x)\)?

Finding Probabilities of Events

The PMF can be used to find the probabilities of events.

We have that for some event \(A\) and a RV \(X\),

\[\mathbb{P}(X \in A) = \sum_{x \in A} p_X(x).\]

Recall: it rains 30% of the time, is cloudy but not raining 30% of the time, and sunny 40% of the time. Your happiness is given by the random variable \(Z\) with \(Z=-6, 2,\) and 10 respectively.

Let \(A\) be the event that you are happier than 0. Find \(\mathbb{P}(Z \in A)\).

Another PMF

For a fixed number \(\theta \in [0, 1]\), define a discrete random variable \(X\) with PMF given by:

\[ p_X\left( x;\ \theta\right) =\begin{cases} \left( 1-\theta\right) ^{2} & x=0, \\ 2\theta\left( 1-\theta\right) & x=1, \\ \theta^{2} & x=2, \\ 0 & \mbox{else.} \end{cases} \]

Different values of \(\theta\) will give different PMFs.

\[ p_X\left( x;\ \theta = 0.1\right) =\begin{cases} 0.81 & x=0, \\ 0.18 & x=1, \\ 0.01 & x=2 \\ 0 & \mbox{else.} \end{cases} \]

By including a parameter \(\theta\), we are able to express many distributions (a family of distributions) with a single functional form.

2 Discrete families

Bernoulli

An experiment with 2 outcomes (success/failure).

Suppose the probability of success is \(\theta \in (0,1)\). A RV \(X\) with PMF given by

\[p_X(x; \theta) = \theta^x(1-\theta)^{1-x}I_{\{0,1\}}(x),\]

is said to have the \({\mathrm{Bern}}(\theta)\) distribution.

If \(\theta = 0.5\), then we have our favourite fair-coin friend.
But we can let \(\theta\) be numbers other than 0.5.

Bernoulli

Binomial

Used to determine the number of “successes” (\(x\)) in a sequence of \(n\) independent Bernoulli trials, each with the same probability of success (\(\theta\)).

Suppose the probability of success is \(\theta \in (0,1)\). A RV \(X\) with PMF given by

\[p_X(x; n, \theta) = \binom{n}{x} \theta^x (1-\theta)^{n-x}I_{\{0,1,\dots,n\}}(x),\]

is said to have the \({\mathrm{Binom}}(n, \theta)\) distribution.

Binomial

Using the Binomial

\[p_X(x; n, \theta) = \binom{n}{x} \theta^x (1-\theta)^{n-x}I_{\{0,1,\dots,n\}}(x)\]

Suppose Midterm 1 has 5 True/False questions. What is the probability that you get at least 4 correct answers by random guessing?
Suppose Midterm 1 has 2 multiple choice questions with 5 options. What is the probability that you get both correct by random guessing?

Using the Binomial

Poisson

Let \(\lambda > 0\). A RV \(Y\) with PMF given by

\[p_Y(y; \lambda) = \frac{\lambda^y}{y!}e^{-\lambda} I_{\{0,1,2,\dots\}}(y),\]

is said to have the \({\mathrm{Poiss}}(\lambda)\) distribution.

Sometimes the PMF is written as: \[p_Y(y; \lambda t) = \frac{(\lambda t)^y}{y!}e^{-(\lambda t)} I_{\{0,1,2,\dots\}}(y),\] where \(\lambda\) is “per unit time” and \(t\) is the number of seconds/minutes/days/etc.

Poisson

The Poisson distribution is good for modelling counts of things, often per unit of time. This is not the only distribution that is good for modelling counts, and we’ll see reasons to potentially choose a different one later.
\(\lambda\) is often the average number of events per unit of time. \(X\) can be the count of events in a particular unit of time

Poisson

A coffee shop receives an average of 3 customers per minute during the morning rush. What is the probability that exactly 5 customers arrive in a given minute?

Here, \(\lambda\) is the average number of customers per minute. If we let \(X\) be the number of customers in a given minute, then \(X \sim {\mathrm{Poiss}}(3)\). Using the formula:

\[ p_X(x; \lambda) = \frac{\lambda^x}{x!}e^{-\lambda} I_{\{0,1,2,\dots\}}(x) \]

\[ p_X(x = 5) = \frac{3^5}{5!}e^{-3} = 0.1008 \]

Poisson Maximization

Suppose that \(Y\sim {\mathrm{Poiss}}(\lambda)\), and you observe \(Y=5\). What value of \(\lambda\) makes \(\mathbb{P}(Y=5)\) as large as possible?

\[p_Y(y; \lambda) = \frac{\lambda^y}{y!}e^{-\lambda} I_{\{0,1,2,\dots\}}(y)\]

Hint: Take the \(\log\)!

Poisson Maximization

Geometric Distribution

Suppose the probability of success is \(\theta \in (0, 1)\). A RV \(X\) with PMF given by

\[ p_X(x; \theta) = (1 - \theta)^x\theta I_{\{0,1,2,\dots\}}(x) \] is said to have a \({\mathrm{Geom}}(\theta)\) distribution.

\(X\) could be the number of failures before a success, where we have a sequence of \(n\) independent Bernoulli trials, each with the same probability of success.

Geometric Distribution

What is the probability of flipping 3 coins before seeing the first tails?

We can let \(X\) be the number of heads (failures) before the first tail (success). Then:

\[ p_X(x = 3; 0.5) = (1 - 0.5)^3(0.5) = 0.0625 \]

Negative Binomial

Suppose again the probability of success is \(\theta \in (0,1).\) The random variable \(Y\) with PMF given by

\[ p_Y(y; \theta, r) = {{r - 1 + y}\choose {y}}\theta^{r}(1 - \theta)^yI_{\{0,1,2,\dots\}}(y) \]

is said to have a \({\mathrm{NegBinom}}(r, \theta)\) distribution.

This distribution is useful in determining the probability that \(y\) failures appear before the \(r\)th success in a repeated experiment, like a coin flip.
This is a generalization of the geometric distribution.

Negative Binomial

What is the probability of flipping 3 heads before seeing the 2nd tails?

Hypergeometric

Suppose we have a finite population of size \(N\) containing \(K\) successes, and a sample size of \(n\). A random variable \(X\) with PMF

\[ p_X(x; N, K, n) = \frac{{K\choose x}{{N - K}\choose{n-x}}}{{N \choose n}}I_{\{0,1,2,\dots\}}(x) \]

Can be used to determine the probability that there are \(x\) successes out of \(n\) draws without replacement.

Hypergeometric

A shipment of 50 laptops arrives at a warehouse, 15 of which are defective. A quality control inspector randomly selects 10 laptops to inspect. What is the probability that exactly 3 of the selected laptops are defective?

Other important (named) discrete distributions

\({\mathrm{Categorical}}(\{p_1,\dots,p_K\})\) — Generalization of \({\mathrm{Bern}}(\theta)\) to \(K\) outcomes instead of 2.
\({\mathrm{Multinom}}(n, \{p_1,\dots,p_K\})\) — Generalization of \({\mathrm{Binom}}(n, \theta)\) to \(K\) outcomes instead of 2.

For each of these (and many others), we don’t need to “count” stuff. We just ask the math formula (or software) to calculate the probabilities of outcomes.

The trick is to translate the word problem to the correct named distribution.

Translating Word Problems

For each of the following experiments, name the distribution that would make an appropriate model, and if obvious, identify the value(s) of any parameters.

How many fries can you throw at the seagulls on Granville Island before they eat six?
Roll a die until you see 6. How many rolls?
Given 1000 bank transactions, an auditor samples 50 to check for fraud. If all 50 are legitimate, what is the probability that the remainder contains fraud?
How many patients arrive at the ER in 3 hours?

Translating Word Problems

To do:

Review calculus: ensure you can solve these problems (link).
Read Chapter 2.4 before Tuesday’s class
Assignment 2 due May 27th, 11:59pm.