Module 4


Matias Salibian Barrera

Last modified — 29 Sep 2025

Random variables

  • A random variable is a feature (generally numeric) measured from the outcome of a random experiment.
  • We use capital letters (near the end of alphabet) to denote random variables (\(X\), \(Y\), \(Z\), etc.)
  • \(X=\) number of defective items in a batch.
  • \(T=\) time until a certain signal occurs.
  • \(Y=\) number of daily website visits.


  • A random variable is a function with domain the sample space and range a subset of real line:

    \[X \, : \, \Omega \ \rightarrow {\mathbb{R}}\]

    \[X( \omega ) \in {\mathbb{R}}\]

Example

  • Experiment: Toss a coin 5 times.
  • Sample space:

\[\Omega \, = \, \Bigl\{ ( x_{1}, x_{2}, \ldots, x_{5} ) \, : \ x_{i} \in \{ H, T \} \ \Bigr\}\]

  • Let \(X( \omega )=\) number of heads in \(\omega\).

\[\text{For example: } \qquad X \bigl( \, \{T, H, T, H, H\} \, \bigr) = 3\]

  • Let \(Y( \omega )=\) longest run of heads in \(\omega\)

\[\text{For example: } \qquad Y \bigl( \, \{T, H, T, H, H\} \, \bigr) = 2\]

Random variables and events

  • Random variables are naturally used to describe events of interest.
  • For example \(\Bigl\{ X > 3 \Bigr\}\) or \(\Bigl\{ Y \le 2 \Bigr\}\)
  • Formally, this notation means:

\[\Bigl\{ X > 3 \Bigr\} \, = \, \Bigl\{ \omega \in \Omega \, : \, X \left( \omega \right) > 3 \Bigr\} \subseteq \Omega\]

and

\[\Bigl\{ Y \le 2 \Bigr\} \, = \, \Bigl\{ \omega \in \Omega \, : \, Y \left( \omega \right) \le 2 \Bigr\} \subseteq \Omega\]

In other words, \(\Bigl\{ X > 3 \Bigr\}\) and \(\Bigl\{ Y \le 2 \Bigr\}\) are events.

The collection of events for which our probability can be calculated is denoted as \({\cal B}\).

Technical digression

  • \({\cal B}\) need not be \(2^{\Omega}\) (technical issues arise)

  • We will assume that \({\cal B}\) is the smallest set of events that includes all events of the form \(\left\{ X \leq x \right\}\), and that is closed under complements and (countable) unions.

  • This ensures that we can calculate

    \[\mathbb{P}(X \leq 3) \qquad \mathbb{P}(X > \pi)\]

    etc.

  • Interested in more details? Check MATH 420

Range/support of random variables

  • The range of \(X\) is the set of all possible values for \(X\), call it “\({\cal R}\)”.

  • \(X =\) # defective items in a collection of size \(N\):

    \[{\cal R} \, = \, \mbox{Range of } X \, = \, \left\{ 0, 1, \ldots, N \right\} \subset \mathbb{N}\]

  • \(Y =\) # website visits in a day:

    \[{\cal R} \, = \, \mbox{Range of } Y \, = \, \left\{ 0, 1, \ldots, \right\} \, = \, \mathbb{N}.\]

  • \(Z\) = Waiting time until next earthquake:

    \[{\cal R} \, = \, \mbox{Range of } Z \, = \, \left[ 0, \infty \right) \, \subset \, \mathbb{R}.\]

Discrete random variables

  • A random variable \(X\) is discrete if

    • its range \({\cal R}\) is finite

      • \({\cal R} \, = \, \left\{ 1, 2, \ldots, N \right\}\)
      • \({\cal R} \, = \, \left\{ 0, 1 \right\}\)
    • its range is countable

      • \({\cal R} \, = \, \mathbb{N}\)
      • \({\cal R} \, = \, \mathbb{Z}.\)
  • Roughly, a random variable \(X\) is continuous if its range \({\cal R}\) is an interval such as

    • \({\cal R} \, = \, \left[0, 1 \right],\)
    • \({\cal R} \, = \, \left( -2 , \infty \right)\)
    • \({\cal R} \, = \, \left( -\infty , \infty \right) \, = \, {\mathbb{R}}\)

    A more precise definition will be given below.

PMF and CDF of a discrete random variables

  • PMF: When \(X\) is a discrete random variable \(X\), its probability mass function is \[f_X(x) \, = \, \mathbb{P}\left( X = x \right) \, , \qquad \mbox{ for } x \in {\mathbb{R}}.\] Note that \[f_X(x) = 0 \quad \text{ if } \quad x \notin {\cal R}\]
  • CDF: The cumulative distribution function of a random variable \(X\) is \[F_X(x) \, = \, \mathbb{P}\left( X \le x \right) \, , \qquad \mbox{ for } x \in {\mathbb{R}}\]

Example

  • Consider an experiment of tossing a coin 3 times.
  • Let \(X =\) # of Tails. Then \({\cal R} \, = \, \left\{ 0, 1, 2, 3 \right\}\)


PMF

\(x\) \(f_X(x)\)
0 1/8
1 3/8
2 3/8
3 1/8
else 0

CDF

\(x\) \(F_X(x)\)
0 1/8
1 4/8
2 7/8
3 1

Example (cont)

  • NOTE:
    • \(f_X(0.3) = \mathbb{P}(X = 0.3) = 0\)
    • \(F_X(-1) = \mathbb{P}(X \leq -1) = 0\)
    • \(F_X(1.3) = \mathbb{P}(X \leq 1.3) = 4/8 = 1/2\).


  • Since \(f_X(x) = 0\) for \(x \notin {\cal R}\) we only need to list its values on \({\cal R}\)

  • Although \(F_X(x)\) is defined for all \(x \in {\mathbb{R}}\), note that \(F_X(a)\) remains constant for values of \(a\) “between” two consecutive elements of \({\cal R}\), so we do not really need to list \(F_X(x)\) for all \(x \in {\mathbb{R}}\) (which would be impossible, of course)

Properties of CDF

Let \(F_W(x)\) be the CDF of a random variable \(W\).

We have

  1. \(\lim_{x \to -\infty} F_W(x) = 0\) and \(\lim_{x \to \infty} F_W(x) = 1\)
  2. \(F_W(x)\) is non-decreasing
  3. \(F_W(x)\) is right continuous: \[\lim_{x \, \downarrow \, a} F_W(x) = F_W(a) \qquad \text{for all } a \in {\mathbb{R}}\]

Properties of PMF

  • Suppose \(f_X(x)\) is a PMF
  • Then \(f_X(x) \geq 0\) and \(\sum_{x \in {\cal R}} f_X(x) = 1\).
  • Often, discrete random variables take integer values: \({\cal R} \subset \mathbb{N}\)
  • In this case, \(f_X(k) = F_X(k) - F_X(k-1)\) for any \(k \in \mathbb{N}\), \(f_X(x) = 0\) for \(x \notin \mathbb{N}\).

\[\begin{align*} f_X(k) &= \mathbb{P}\left( X = k \right) = \mathbb{P}\left( k-1 < X \le k \right) \\ & \\ &= \mathbb{P}\left( (X \le k) \cap (X > k-1) \right) = \mathbb{P}\left( (X \le k) \cap (X \le k-1)^c \right) \\ & \\ &= \mathbb{P}\left( X \le k \right) - \mathbb{P}\left( (X \le k) \cap (X \le k-1) \right) \\ & \\ &= \mathbb{P}\left( X \le k \right) - \mathbb{P}\left( X \le k-1 \right) = F_X(k) - F_X(k-1) \end{align*}\]

Example: rolling two dice

  • Consider the experiment of rolling two fair dice.
  • Let \(X\) be their sum
  • \(X\) is a discrete random variable, \({\cal R} = \{2, 3, \ldots, 12\}\).
  • Its PMF and CDF are:
\(x\) 2 3 4 5 6 7 8 9 10 11 12
\(f_X(x)\) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
\(F_X(x)\) 1/36 3/36 6/36 10/36 15/36 21/36 26/36 30/36 33/36 35/36 1
  • \(f_X(x) = 0\) for all other \(x\in{\mathbb{R}}\).
  • What about \(F_X(x)\)?

Table or algebraic expression?

  • This pmf \(f_X(x)\) can also be written as \[f_X(x) = \frac{6 - |7-x|}{36}\] for \(x \in {\cal R}\), \(f_X(x)=0\) otherwise.

  • Both approaches (table & closed-form formula) are equally useful: you can calculate \(f_X(x)\) and \(F_X(x)\) for any \(x \in \mathbb{R}\).

  • It is very important to specify \({\cal R}\) whenever you write the PDF or CDF.

Another example of PMF and CDF

For a fixed number \(p \in [0, 1]\), define a discrete random variable \(X\) with PMF and CDF given by:

PMF

\[ f\left( x;\ p\right) =\begin{cases} \left( 1-p\right) ^{2} & x=0, \\ 2p\left( 1-p\right) & x=1, \\ p^{2} & x=2, \\ 0 & \mbox{else.} \end{cases} \]

CDF

\[ F\left( x;\ p\right) = \begin{cases} 0 & x<0, \\ \left( 1-p\right) ^{2} & 0\leq x<1, \\ 1-p^{2} & 1\leq x<2, \\ 1 & x\geq 2. \end{cases} \]

Another example of PMF and CDF

Different values of \(p\) will give different PMFs and CDFs

For example, if \(p = 0.10\)

PMF

\[ f\left( x;\ p\right) =\begin{cases} 0.81 & x=0, \\ 0.18 & x=1, \\ 0.01 & x=2 \\ 0 & \mbox{else.} \end{cases} \]

CDF

\[ F\left( x;\ p\right) = \begin{cases} 0 & x<0 \\ 0.81 & 0\leq x<1 \\ 0.99 & 1\leq x<2 \\ 1 & x\geq 2 \end{cases} \]

By including a parameter \(p\), we are able to express many distributions (a family of distributions) with a single functional form.

A simple example

  • Consider an experiment consisting of flipping 5 coins

  • The sample space of this experiment is

\[\Omega \, = \, \left\{ \left( x_{1},x_{2},...,x_{5}\right) \, : \, x_{i} \in \left\{ H,T \right\} \, \right\}\]

  • Then \(\# \Omega = 2^5 = 32\). We will assume that all outcomes (\(\omega \in \Omega\)) are equally likely

  • We are interested in the number of Tails we obtain in our toss

  • Define a random variable

    \[Y = \left\{ \mbox{ number of Tails } \right\}\]

  • For example \(\left\{ Y = 3 \right\}\) is

\[ \left\{ Y = 3 \right\} \, = \, \left\{ \text{ there are exactly 3 Tails in the 5 tosses } \right\} \]

CDF and PMF of a random variable

  • Using what we have learned so far, you can prove that

    \[f_Y(3) = \mathbb{P}(Y=3) = {5 \choose 3} (1/2)^5.\]

  • Even more, for any \(k = 0, 1, 2, 3, 4\) and \(5\) we have

\[f_Y(k) = \mathbb{P}(Y= k ) = {5 \choose k} (1/2)^5\]

  • Finally

\[f_Y(k) = \mathbb{P}(Y = k) = 0 \qquad \text{ if } \quad k \notin \{ 0, 1, 2, 3, 4, 5 \}\]

Example

  • We can check that indeed

\[\sum_{x \in {\cal R} } f_Y(x) = 1\]

specifically

\[\sum_{x=0}^{5} f_Y ( x ) = \sum_{b=0}^{5} { 5 \choose b} \left( \frac{1}{2}\right) ^{5} =1\]

Note that we used \(x\) and \(b\) to list possible values of \(Y\).

One can use any symbol for the auxiliary variables listing the values that \(Y\) can take

Urn example

  • An urn contains \(n\) numbered balls \((1, 2, \ldots, n)\).
  • Randomly draw \(k\) balls \((1<k<n)\) without replacement.
  • Let \(Y\) represent the largest number among them.
  1. What is \(\mathcal{R}\), the range of \(Y\)?
  2. Find \(F_{Y}\left(y\right)\) for all \(y\in \mathcal{R}\)
  3. Find \(f_{Y}\left(y\right)\) for all \(y\in \mathcal{R}\)

What is the range of \(Y\)?

  • The smallest possible value of \(Y\)  is  \(k.\)    

  • Corresponds to the event that the \(k\) drawn numbers are  \(\left\{ 1,2,...,k\right\}\)

  • The largest possible value for \(Y\)  is  \(n\).   

  • Therefore the range of \(Y\) is \[\mathcal{R}=\left\{ k,k+1,...,n\right\}\]

Find \(F_Y(y)\)

Find \(f_Y (y)\)

Properties

Properties of discrete PMF’s

  1. \(f_X(x) \ge 0\)

  2. \(\sum_{x\in \mathcal{R}} f_X(x) = 1\), where \(\mathcal{R} =\) range of \(X\)

  3. \(\mathbb{P}\left( X \in A \right) =\sum_{x\in A} f_X(x)\), where \(A\subset \mathbb{R}\)

Properties of discrete CDF’s

  1. \(0 \le F_X(x) \le 1\)

  2. \(F\left( x \right)\) is non-decreasing and right-continuous

  3. \(\lim_{a \to -\infty} F_X\left( a \right) =0\), \(\lim_{a \to +\infty} F_X\left( a \right) =1\)

  4. \(\mathbb{P}\left( a<X\leq b\right) =F\left( b\right) -F\left( a\right)\) for any \(a<b\)

  5. \(f\left( k\right) =F\left( k\right) -F\left( k-1\right)\) for all \(k \in {\cal R}\)

Expected values

  • Suppose \(X\) is a discrete random variable with PMF \(f(x)\). Then, its expected value is defined as

    \[\mathbb{E}[X] \ = \ \sum_{k \in \mathcal{R}} k \, P ( X = k ) \, = \, \sum_{k \in \mathcal{R}} k \, f_X(k)\]

  • Let \(g : \mathcal{R} \rightarrow \mathcal{D}\) be a function, and let \(Y = g(X)\)

  • For example, \(g(x) = \sin(x)\) or \(g(x) = (x - 3)^2\).

  • Then \(Y = g(X)\) is itself a function on \(\Omega\):

\[ Y : \Omega \to {\cal D} \qquad \text{ with } \qquad Y( \omega ) = g \left( X ( \omega ) \right) \] and hence \(Y\) is a discrete random variable, with its own pdf, \(f_Y\), say.

Expected values

  • To compute \(\mathbb{E}(Y)\) we should calculate

\[ \mathbb{E}(Y) \, = \, \sum_{z \in {\cal D}} z \, f_Y(z) \, = \, \sum_{z \in {\cal D} } z \, \mathbb{P}( Y = z) \, = \, \sum_{z \in {\cal D} } z \, \mathbb{P}( g(X) = z)\]

  • We have the following result:

\[\mathbb{E}[g(X)] = \sum_{k \in \mathcal{R}} g(k) \, f_X(k)\]

  • In other words, we do not need to compute \(f_Y\) to calculate \(\mathbb{E}(Y)\)
  • The proof is beyond the scope of this class.

Calculate expectation in a simple situation

  • Let \(X\) be a random variable with PMF given by
\(k\) \(f_X \left( k \right)\)
0 0.15
1 0.25
2 0.30
3 0.20
4 0.10
else 0
  1. Calculate \(\mathbb{E}[ X]\)
  2. Calculate \(\mathbb{E}[X^2]\)

Solution

\[\mathbb{E}\left[ X \right] = \sum_{k=0}^4 k \, f_X(k) = 0 \times 0.15 + 1 \times 0.25 + 2 \times 0.30 + 3 \times 0.20 + 4\times 0.10 = 1.85.\]

\[\begin{aligned}\mathbb{E}\left[ X^{2}\right] &=\sum_{x=0}^{4} x^{2} \, f_X \left( x\right) \\ &= 0 \times 0.15 + 1 \times 0.25 + 4 \times 0.30 + 9 \times 0.20 + 16 \times 0.10 \\ &=4.85.\end{aligned}\]

Properties and notation

  • We often denote the expectation of \(X\) by \(\mu\), or \(\mu_X\), or \(\mathbb{E}\left[ X \right]\).

  • Expectation is a linear operator: for any \(a, b \in \mathbb{R}\), we have

    \[\mathbb{E}[ a \, X + b ] \ = \ a \, \mathbb{E}[ X ] + b\]

  • Key point: \(a\) and \(b\) are fixed numbers, not random. For example:

    \[\mathbb{E}[ 3.2 \, X - 2.5 ] \ = \ 3.2 \, \mathbb{E}[ X ] - 2.5\]

Proof of the linearity of \(\mathbb{E}\)

Let \(X\) be a random variable and \(a\) and \(b\) be fixed real numbers.

Then, using the result above for functions \(g(X)\), using this particular function \(g(X) = a \, X + b\) we have

\[\begin{align*} \mathbb{E}[ a \, X + b ] &= \sum_{k \in \mathcal{R}} \left( a \, k + b \right) \, f_X( k ) \\ &= \sum_{k \in \mathcal{R}} a \, k \, f_X ( k ) \, + \, \sum_{k \in \mathcal{R}} b \, f_X( k ) \, = \, \\ &= a \, \sum_{k \in \mathcal{R}} \, k \, f_X( k ) \, + \, b \, \sum_{k \in \mathcal{R}} \, f_X ( k )\\ &= \, a \, E [ X ] + b. \end{align*}\]

Optimal prediction

  • Let \(f(k)\) be the pmf of a random variable \(X\).

  • For each \(t \in \mathbb{R}\), consider the average squared distance from \(X\) to \(t\):

    \[m( t) = \mathbb{E}[ ( X-t) ^{2}] = \sum_{k \in {\cal R}} \, (k - t )^2 \, f_X ( k )\]

  • What is the value of \(t \in {\mathbb{R}}\) that is closest to \(X\) on average?

  • What is the value of \(t \in \mathbb{R}\) that minimizes \(m ( t )\)?

Optimal prediction

  • We can find the minimum point by finding \(t_0 \in \mathbb{R}\) that solves

\[\begin{align*} \left. {\frac{{\partial}^{} m(t) }{\partial{ t }^{}}} \right|_{t = t_0} &= \left. \left[ - 2 \sum_{x} (x-t) f_X(x) \right] \right|_{t = t_0} \overset{set}{=} 0 \\ &\Leftrightarrow \overset{ \mathbb{E}[X] }{\overbrace{\sum_{x} x \, f_X(x) }} = t_0 \ \overset{1}{\overbrace{\sum_{x} f_X(x) }} \\ &\Rightarrow t_0 = \mathbb{E}[X] \end{align*}\]

  • Since \(m^{\prime \prime } (t_0) = 2 > 0\), then \(t_0\) is the unique minimum of the function \(m\):

    \[\mathbb{E}\left[ ( X - \mathbb{E}[X] )^2 \right] \, \le \, \mathbb{E}\left[ ( X-t )^2 \right] \quad \mbox{ for all } t \in \mathbb{R}\]

This result shows that \(\mathbb{E}[X]\) is the value that minimizes the mean squared prediction error for \(X\).

An alternative proof

For any \(t \in \mathbb{R}\)

\[\begin{align*} (X - t)^2 &= (X - \mathbb{E}[X] + \mathbb{E}[X] - t)^2 = \left( (X - \mathbb{E}[X]) + (\mathbb{E}[X] - t) \right)^2 \\ & \\ &= (X - \mathbb{E}[X])^2 + (\mathbb{E}[X] - t)^2 + 2\, (X - \mathbb{E}[X]) \times (\mathbb{E}[X] - t) \\ \end{align*}\]

Thus

\[\begin{align*} \mathbb{E}(X - t)^2 &= \mathbb{E}\left[ (X - \mathbb{E}[X])^2 \right] + \mathbb{E}\left[ (\mathbb{E}[X] - t)^2 \right] + \mathbb{E}\left[ 2\, (X - \mathbb{E}[X]) \times (\mathbb{E}[X] - t) \right] \\ & \\ &= \mathbb{E}\left[ (X - \mathbb{E}[X])^2 \right] + (\mathbb{E}[X] - t)^2 \ge \mathbb{E}\left[ (X - \mathbb{E}[X])^2 \right] \end{align*}\]

because \(\mathbb{E}[X] \in {\mathbb{R}}\), \((\mathbb{E}[X] - t) \in {\mathbb{R}}\), and \(\mathbb{E}\left[ X - \mathbb{E}[X] \right] = 0\), thus \[ \mathbb{E}\left[ 2\, (X - \mathbb{E}[X]) \times (\mathbb{E}[X] - t) \right] = 2 \, (\mathbb{E}[X] - t) \, \mathbb{E}\left[ X - \mathbb{E}[X] \right] = 2 \, (\mathbb{E}[X] - t) \, 0 = 0 \]

An alternative proof

Hence, for any \(t \in {\mathbb{R}}\)

\[ \mathbb{E}(X - t)^2 \ge \mathbb{E}\left[ (X - \mathbb{E}[X])^2 \right] \] so \(t_0 = \mathbb{E}[X]\) clearly minimizes the left-hand side.

Common misunderstanding of “expectation”

  • If \(X\) is the output of rolling a fair die, \[\mu_X = \mathbb{E}[X] = (1/6) \{ 1 + 2 + \cdots + 6\} = 3.5.\]
  • But 3.5 can never be observed when you roll a die. \(\mathbb{P}(X=3.5) = 0\).
  • Moreover, every time you roll it, the value \(X\) takes will be different (and never 3.5)
  • Yet, it is still meaningful to examine, on average, how far any \(X=x\) is from its expected value \(\mu_x\).

Variance

  • Let \(X\) be a random variable with expected value (or mean) \(\mathbb{E}[X]\) or \(\mu_x\).

  • We call \[\mbox{Var}[X] = \mathbb{E}\left[ \left(X - \mathbb{E}[X]\right) ^2 \right]\] the variance of \(X\) if the expectation of the squared term exists

  • It is usually denoted by \(\sigma^2_X\), or \(\sigma^2\)

Variance

  • The variance is the expected squared distance from the random variable \(X\) to its mean \(\mu_x\).

  • Suppose \(X\) is the weight of a newborn baby in grams.

  • Its expectation is the mean weight of newborn babies in grams.

  • The variance of \(X\), \(\sigma^2_X\), is the squared deviation of newborn babies from their mean.

  • Its units are “squared grams”.

Standard deviation

  • The variance is the expected squared prediction error if we predict \(X\) by its mean \(\mu_X = \mathbb{E}[X]\)

  • The standard deviation is the square root of the variance

Example
If \(X\) = height in meters, then

\[\mathbb{E}\left[ X \right] \, = \, 5 m \quad \mbox{ say, } \quad \operatorname{Var}\left[ X \right] \, = 4 m^2\] so

\[\sigma_X = \text{SD}(X) = \sqrt{ \operatorname{Var}\left[ X \right] } \, = \, 2 m\]

Properties of variance

  • If \(X\) is a random variable, and \(a, b \in \mathbb{R}\) are fixed constants, then \[\operatorname{Var}[ a \, X + b \,] \ = \ a^2 \, \operatorname{Var}[ X ].\]

  • Let \(\mu_X = \mathbb{E}\left[ X \right]\) then \[\operatorname{Var}\left[ X \right] \, = \, \mathbb{E}\left[ X^2 \right] - \mu_X^2\]

  • It is often easier to calculate \(\mathbb{E}\left[ X^2 \right]\) than \(\mathbb{E}[ \left( X - \mu_X \right)^2 ]\)

Proof of \(\operatorname{Var}[a X + b] = a^2\operatorname{Var}[ X]\).

Proof of \(\operatorname{Var}[ X ] = \mathbb{E}[X^2] - \mathbb{E}[X]^2\)

Computation Example

  • Suppose \(X\) has PMF

\[f\left( k\right) = \begin{cases} 0.3414 \times \frac{1}{k} & k = 1, 2, \ldots, 10\\ 0 & \text{else}.\end{cases}\]

  • Its expectation is given by

\[\mu = 0.3414 \sum_{k=1}^{10} k \frac{1}{k} = 0.3414 \times 10 = 3.414\]

  • Similarly: \(E\left( X^{2}\right) = 18.777\)

  • Thus, its variance is given by \[\operatorname{Var}\left[ X \right] = 18.777 - 3.414^2 = 7.122\]

PDF given by table

\(x\) \(f(x)\) \(xf(x)\) \(x^2 f(x)\)
0 0.15 0.00 0.00
1 0.25 0.25 0.25
2 0.30 0.60 1.20
3 0.20 0.60 1.80
4 0.10 0.40 1.60
else 0 0 0
  • Find \(\mathbb{E}[X]\) and \(\operatorname{Var}[X]\) and \(\text{SD}[X]\).

Return of the Urn

  • An urn contains \(n\) numbered balls \((1, 2, \ldots, n)\).

  • Randomly draw \(k\) balls \((1<k<n)\) without replacement.

  • Let \(Y\) represent the largest number among them.

  • Calculate the mean and variance of \(Y\) when \(n = 10\) and \(k = 5\).

  • Use the computer to find these values.

Bernoulli trials

  • Consider an event \(A\) such as

    \[A \, = \, \left\{ \mbox{Coin lands ``H''} \right\}\]

    or

    \[A \, = \, \left\{ \mbox{Max annual wind speed > 100 km/h}\right\}.\]

  • Every toss or every year, we check whether \(A\) occurred.

  • These checks are called trials

Bernoulli trials

  • Occurrences of \(A\) are arbitrarily called successes

  • Suppose \(\mathbb{P}\left( A\right)\) remains constant across trials

    \[\mathbb{P}\left( A \right) = \mathbb{P}( \text{ a success } ) = p \in [0, 1]\]

    and that the outcomes of trials are independent

  • We call such trials Bernoulli trials.

Three necessary conditions:

  1. Binary outcomes
  2. Constant rate of success
  3. Independence across trials

Bernoulli random variables

  • Let

\[Y_i = \begin{cases} 1 & \text{if $A$ occurs in the $i^{th}$ trial}\\ 0 & \text{if $A$ doesn't occur in the $i^{th}$ trial.}\end{cases} \]

  • The PMF of \(Y_{i}\) is

\[\begin{aligned}f_{Y_i}\left( a\right) &= \begin{cases} p & \text{ if } \ a=1 \\ & \\ 1-p & \text{ if } \ a=0\\ & \\ 0 & \text{otherwise}\end{cases} \\ & \\ & \\ f_{Y_i}\left( a\right) & = p^{a}( 1-p)^{1-a} \quad \text{ for } a \in\{0,1\}\end{aligned} \]

Bernoulli random variables

  • Notation: \(Y_{i} \sim {\mathrm{Bern}}\left( p\right)\)

  • The mean of \(Y_i\) is

\[\mathbb{E}[ Y_{i} ] =0 \times f_{Y_i}(0) +1 \times f_{Y_i}(1) = p\]

  • The variance of \(Y_i\) is

\[\operatorname{Var}[ Y_i ] = \mathbb{E}[ Y_i - \mu_{Y_i}]^2 = \mathbb{E}[Y_i^2]- \mathbb{E}[Y_i]^2 = p - p^2 = p (1-p)\] because \(\mathbb{E}[Y_i^2] = 0^2 \times f_{Y_i}(0) + 1^2 \times f_{Y_i}(1) = 0 \times (1 - p) + 1 \times p = p\)

Binomial random variables

  • Consider an experiment consisting of a fixed number of Bernoulli trials (\(n \ge 1\)).

  • Let \(X\) be the number of successes out of the \(n\) trials.

  • Let \(p = \mathbb{P}( \mbox{ success in any trial } )\).

  • Convention: we say that \(X\) is binomially distributed, has binomial distribution.

  • We denote

\[X \, \thicksim \, {\mathrm{Binom}}\left( n, p\right)\]

where \(n\) is the number of trials, and \(p\) is the probability of success in each trial (which is the same for all trials).

Binomial random variables

  • The range of \(X\) is \(\left\{ 0, 1, \ldots, n \right\}\).

  • To calculate its PMF, consider the sample space

\[\Omega \, = \, \Bigl\{ \left(Y_1, Y_2, \ldots, Y_n \right), \ Y_j \in \left\{ S, F \right\} \Bigr\} \qquad \text{ and } \quad \# \Omega = 2^n \]

  • Independent trials imply that for any \(\omega \in \Omega\) we have

\[\mathbb{P}\left( \left\{ \omega \right\} \right) = p^{n_S} (1-p)^{n - n_S} \qquad \omega \in \Omega\]

where \(n_S\) = number of successes in \(\omega\)

  • For example

\[\mathbb{P}\Bigl( \left( S, S, F, F, \ldots, F, S \right) \Bigr) = p^3 (1-p)^{n - 3}\]

Binomial random variables

  • Hence, \(f_X(k) = \mathbb{P}\left( X = k \right)\) is

\[f_X(k) \, = \, \mathbb{P}\left( \omega \in \Omega \ \mbox{ with exactly } k \ \mbox{successes} \right)\]

  • How many \(\omega \in \Omega\) have exactly \(k\) successes?

\[\text{There are } {n \choose k} \ \text{such } \omega's \qquad \text{ (Why? Prove it!)}\]

  • Hence,

\[f_X(k) \, = \begin{cases} \binom{n}{k} \, p^k \, \left( 1 - p \right)^{n-k} & \text{ if } k \in \{0,\dots,n\} \\ & \\ 0 & \text{otherwise} \end{cases}\]

Binomial theorem

  • Binomial Theorem: for any \(a\), \(b \in {\mathbb{R}}\) and \(n \in \mathbb{N}\):

\[\left( a + b \right)^n \, = \, \sum_{k=0}^n \, \left( \begin{array}{c} n \\ k \end{array} \right) \, a^k \, b^{n - k} \]

Mean and variance

  • When \(X \sim {\mathrm{Binom}}(n, p)\), we have \[\mathbb{E}[ X ] = \sum_{k=0}^n \, k {n \choose k} p^k (1-p)^{n-k} = n p.\] and \[\mathbb{E}[ X^2 ] = \sum_{k=0}^n \, k^2 {n \choose k} p^k (1-p) ^{n-k} = p^2 ( n-1 ) n + n p.\]

  • Therefore, \[\operatorname{Var}[X] = n p (1- p).\]

Expectation of a Binomial Random Variable

One approach to find expectation is as follows:

\[\begin{aligned} \mathbb{E}[X] &= \sum_{k=0}^n k {n \choose k} p^k (1-p)^{n-k} \\ &= \sum_{k=0}^n k \frac{n!}{k! (n-k)!} p^k (1-p)^{n-k} \\ &= n p \sum_{k=1}^n \frac{(n-1)!}{(k-1)! ((n-1) -(k-1))!} p^{k-1} (1-p)^{n-k} \\ &= n p \sum_{j=0}^m {m \choose j} p^{j} (1-p)^{m - j} \\ &= np \end{aligned}\]

Variance of a Binomial Random Variable

Start with \[\begin{aligned} \mathbb{E}[X(X-1)] &= \sum_{k=0}^n k(k-1) {n \choose k} p^k (1-p)^{n-k} \\ &= \sum_{k=0}^n k(k-1) \frac{n!}{k! (n-k)!} p^k (1-p)^{n-k} \\ &= n(n-1) p^2 \sum_{k=2}^n {n-2 \choose k-2} p^{k-2} (1-p)^{(n-2) -(k-2)}\\ &= n(n-1) p^2.\end{aligned} \]

  • Note that \(X^2 = X + X(X-1)\).
  • Hence \(\mathbb{E}[X^2] = np + n(n-1)p^2.\)
  • Finally, \(\operatorname{Var}[X] = \mathbb{E}[X^2] - \mathbb{E}[X]^2 = np + n(n-1)p^2 - n^2 p^2 = np(1-p).\)

Attention shoppers!

Important

Both of the previous calculations used a trick.

  • Inside the sum, we made something look like a PMF
  • We know PMF’s sum to 1.
  • Sometimes called “kernel matching”
  • We will use similar ideas to calculate unpleasant integrals

Example of a Binomial random variable

  • Suppose that finding oil when digging in a certain region has probability \(p = 0.10\) of success.

  • Assume that digging attempts (trials) are independent.

    1. How many wells should be dug so that the probability of finding oil is at least 0.95?

    2. How many wells should be dug so that the probability of finding at least 2 successful wells is at least 0.95?

Example (a)

Example (b)

Another discrete distribution - Geometric random variables

  • Consider a sequence of independent Bernoulli trials.

  • Constant probability of success: \(p \in (0, 1)\) (the cases \(p=0\) and \(p=1\) require attention)

  • We repeat the Bernoulli trial until we find the first success

  • What is \(\Omega\)? What is \(\# \Omega\)? Are these outcomes equally likely?

  • Let \(X\) be the number of trials needed to find the first success. What is the range of \(X\) (\({\cal R}\))?

  • Examples:

\[\Bigl\{ X = 3 \Bigr\} \, = \ \Bigl\{ \left( F, F, S \right) \Bigr\}\]

\[\Bigl\{ X = 6 \Bigr\} \, = \ \Bigl\{ \left( F, F, F, F, F, S \right) \Bigr\}\]

Note: there is only one \(\omega \in \Omega\) for which \(X(\omega) = 3\). How many \(\omega \in \Omega\) satisfy \(X(\omega) = k\), where \(k \in \mathbb{N}\)?

Geometric random variables

  • Because the trials are independent

    \[\mathbb{P}\left( X = 3 \right) \, = \, \mathbb{P}\left( \left( F, F, S \right) \right) \, = \, (1-p)^2 \, p.\]

  • In general, we have

\[f_X( k) \, = \mathbb{P}( X = k) = \mathbb{P}( ( F, F, \ldots, F, S)) = \begin{cases} (1-p)^{k-1}p & k \in \{1, 2, 3, \ldots \}\\ & \\ 0 & \text{else}\end{cases}\]

  • Sanity check, for any \(p \in (0, 1)\):

\[\sum_{k =1}^\infty f_K(k) = \sum_{k =1}^\infty (1-p)^{k-1} \, p = 1\]

Geometric random variables

  • Note that this sum means that the probability of eventually seeing a success \((X < \infty)\) is 1. The probability that we never (ever) see a success is \(\mathbb{P}( (X < \infty)^c ) = 0\).

  • The CDF of \(X\) for \(k \in \mathbb{N}\) is

\[\begin{aligned} F_X( k ) & = \mathbb{P}( X \le k ) = \sum_{m=1}^k f_X(m) = \sum_{m=1}^k (1-p)^{m-1} \, p \\ & = p \, \sum_{m=1}^k (1-p)^{m-1} \\ & = p \, \left( \frac{ 1 - (1-p)^k }{ 1 - (1-p) } \right) \end{aligned} \]

Geometric random variables

  • Combining those results into one formula, the CDF of \(X\) is:

\[ F_X(k) = \left\{ \begin{array}{ll} p \, \left( \frac{ 1 - (1-p)^k }{ 1 - (1-p) } \right) & \text{ if } k \in \mathbb{N} \\ & \\ 0 & \text{ otherwise} \end{array} \right. \]

  • Trick question: How much is \(F(3.4)\)?

More oil wells

  • Suppose that finding oil when digging in a certain region has probability \(\theta = 0.10\) of success.

  • Assume that digging attempts (trials) are independent (what does this mean???)

  1. What is the probability that we need to dig 5 wells before we find oil for the first time?

  2. What is the probability that we find oil before digging the \(30^{th}\) well?

More oil (solution)

Expectation and Variance of a Geometric RV

  • If \(X \sim {\mathrm{Geom}}(p)\) with \(p \in (0, 1)\), then:

\[\mathbb{E}\left[ X \right] = \frac{1}{p} \qquad \mbox{and} \qquad \operatorname{Var}\left[ X \right] = \frac{1-p}{p^2}\]

\[ \mathbb{E}\left[ X \right] = \sum_{k =1 }^\infty \ k \, f(k) = \sum_{k =1 }^\infty \ k \, (1-p)^{k-1} p = (-p) \, {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \sum_{k =1 }^\infty (1 - p)^k \right\} \]

  • Now note that for \(|a| < 1\)

\[\sum_{k=1}^\infty \, a^k \, = \, \frac{a}{1-a}\]

Expectation and Variance of a Geometric RV (cont’d)

  • Thus

\[ {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \sum_{k =1 }^\infty (1 - p)^k \right\} = {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \frac{1-p}{p} \right\} = -\frac{1}{p^2}\]

  • Finally

\[\mathbb{E}\left[ X \right] = (-p) \, {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \sum_{k =1 }^\infty (1 - p)^k \right\} = (-p) \, {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \frac{1-p}{p} \right\} = (-p) \times (-\frac{1}{p^2}) = \frac{1}{p}\]

Variance of a Geometric RV

  • To calculate \(\operatorname{Var}\left[ X \right]\), first check that: \(\mathbb{E}\left[ X \left( X - 1 \right) \right] \, = \, \mathbb{E}\left[ X^2 - X \right] \, = \, \mathbb{E}\left[ X^2 \right] - \mathbb{E}\left[ X \right]\)

Expectation and Variance of a Geometric RV (cont’d)

  • Similarly to what we did for \(\mathbb{E}[X]\)

\[\mathbb{E}\left[ X \, \left( X - 1 \right) \right] \, = \, 2 \, \left( \frac{1-p}{p^2} \right)\]

  • Thus

\[\mathbb{E}\left[ X^2 \right] = \mathbb{E}\left[ X \, \left( X - 1 \right) \right] + \mathbb{E}[X] = 2 \, \left( \frac{1-p}{p^2} \right) + \frac{1}{p} \, = \, \frac{2-p}{p^2}\]

  • And finally,

\[\operatorname{Var}\left[ X \right] = E \left[ X^2 \right] - \left( \mathbb{E}[ X ] \right)^2 = E \left[ X^2 \right] - \frac{1}{p^2} = \frac{1-p}{p^2}\]

Example (job applicants)

  • Job applicants are randomly selected for an interview.

  • Assume that 10% of the applicants are suitable

    1. Calculate the probability that it will take more than 10 interviews to fill the position

    2. Each interview costs $ 50. Calculate the expected cost of the search and the standard deviation of the cost.

    3. Suppose that 6 applicants have already been interviewed and the position is still open. What is the probability that it will take more than 10 additional interviews to fill the position?

Example (a)

Example (b)

Example (c)

Lack of memory

  • If \(X \thicksim {\mathrm{Geom}}(p)\), \(p \in (0,1)\), then

\[\mathbb{P}\left( X > k + m \ \vert\ X > k \right) = \mathbb{P}\left( X > m \right) \qquad k, m \in \mathbb{N}\]

  • In words: \(X\) does not “remember” waiting the first \(k\) trails

  • The proof is simple

\[\begin{aligned} \mathbb{P}( X > k + m \ \vert\ X > k ) &= \frac{ \mathbb{P}\left( X > k + m \right) }{ \mathbb{P}\left( X > k \right) } \\ &= \frac{ 1 - \mathbb{P}\left( X \le k + m \right) }{ 1 - \mathbb{P}\left( X \le k \right) } \\ &= \frac{ \left(1 - p \right)^{k + m} }{ \left( 1 - p \right)^k } = \left( 1 - p \right)^m \\ & = \mathbb{P}( X > m) \end{aligned}\]

Poisson random variables

  • Let us count the number of occurrences of a type of incident during a given period of time.

  • Examples:

    • Number of earthquakes (mag. 5+) in a year;

    • Car collisions at Oak and Cambie in a month.

  • Let \(\lambda > 0\) be rate of occurrence of the incident:

    • \(\lambda = 7\) per year;

    • \(\lambda = 50\) per month.

Poisson random variables

  • Define $X = $ Number of incidents in a unit time interval.

  • The range of \(X\) is the set \(\{0, 1, 2, ...\}\)

  • One choice for a PMF to model such a random variable is given by

\[f_X( k ) = \mathbb{P}( X = k ) = \begin{cases} \frac{ e^{-\lambda} \, \lambda^k }{k!} & k = 0, 1, 2, 3, \ldots \\ & \\ 0 &\text{else.}\end{cases}\]

The mathematics behind the derivation of this PMF will not be discussed in Stat 302.

Sanity check

We can check that \(f_X\) defined above is indeed a proper PMF.

  • It satisfies \(f_X(k) \ge 0\) for \(k \in \mathbb{N} \cup \{ 0 \}\)

  • Also:

\[\begin{aligned} \mathbb{P}(\Omega) &= \sum_{x=0}^{\infty }\frac{e^{-\lambda }\lambda ^{x}}{x!} \\ &=e^{-\lambda} \sum_{x=0}^{ \infty }\frac{\lambda ^{x}}{x!} \\ &= e^{-\lambda }\overset{\text{Taylor expansion of }e^{\lambda }}{\overbrace{\left[ 1+\lambda +\frac{% \lambda ^{2}}{2}+\frac{\lambda ^{3}}{3!}+\cdots \right] }} \\ &=e^{-\lambda }e^{\lambda }=1.\end{aligned}\]

Mean and variance of Poisson distribution

The mean and variance of a Poisson (\(\lambda\)) random variable are

\[\mathbb{E}[X] = \operatorname{Var}[X] = \lambda\]

Proof:

\[\begin{aligned} \mathbb{E}[X] &= \sum_{k=0}^\infty \, k \frac{ e^{-\lambda} \lambda^k}{k!} = \sum_{k=1}^\infty \, \frac{ e^{-\lambda} \lambda^k}{(k-1)!} = \lambda \, \sum_{k=1}^\infty \, \frac{ e^{-\lambda} \lambda^{k-1}}{(k-1)!} = \lambda \, \sum_{j=0}^\infty \, \frac{ e^{-\lambda} \lambda^{j}}{j!} = \lambda. \end{aligned}\]

For \(\operatorname{Var}[X]\), first check that \(\mathbb{E}[ X ( X - 1) ] = \lambda^2\), and then follow the same steps we used for the Geometric distribution.

Example earthquakes

  • \(Y\) = # of earthquakes over 5.0 in magnitude in a given area

  • Assume \(Y\thicksim {\mathrm{Poiss}}\left(\lambda \right)\) with \(\lambda =3.6\) per year.

  1. What is the probability of having at least 2 earthquakes over 5.0 in the next 6 months?

  2. What is the probability of having 1 earthquake over 5.0 next month?

  3. There were two earthquakes (over 5.0) in the first 6 months. What is the prob that we will have more that 4 this year?

  4. What is the probability of waiting more than 3 months for the next earthquake over 5.0 in that area?

Simplifying assumptions

Let us assume for the moment that \[\begin{aligned} 1\text{ year } &=12\text{ months }=365\text{ days} \\ 1\text{ month } &=4\text{ weeks }=\text{ }30\text{ days} \\ \end{aligned}\]

These assumptions do not hold, but we will use them for illustration.

Implicit assumptions

  • The number of incidents in any period of time is random and has Poisson distribution.

  • The parameter of any Poisson distribution in this question is proportional to the length of the period.

  • The assumption is justifiable under certain conditions spelled out for Poisson process.

  • We will not discuss Poisson process. We will provide this information while working on this example.

2 earthquakes in 6 months

1 earthquake next month

4 more earthquakes

Wait 3 months for an earthquake