Module 4

Matias Salibian Barrera

Last modified — 24 Oct 2025

Random variables

A random variable is a feature (generally numeric) measured from the outcome of a random experiment.
We use capital letters (near the end of alphabet) to denote random variables ($X$, $Y$, $Z$, etc.)
$X=$ number of defective items in a batch.
$T=$ time until a certain signal occurs.
$Y=$ number of daily website visits.

A random variable is a function with domain the sample space and range a subset of real line:

\[X \, : \, \Omega \ \rightarrow {\mathbb{R}}\]

\[X( \omega ) \in {\mathbb{R}}\]

Example

Experiment: Toss a coin 5 times.
Sample space:

\[\Omega \, = \, \Bigl\{ ( x_{1}, x_{2}, \ldots, x_{5} ) \, : \ x_{i} \in \{ H, T \} \ \Bigr\}\]

Let $X( \omega )=$ number of heads in $\omega$.

\[\text{For example: } \qquad X \bigl( \, \{T, H, T, H, H\} \, \bigr) = 3\]

Let $Y( \omega )=$ longest run of heads in $\omega$

\[\text{For example: } \qquad Y \bigl( \, \{T, H, T, H, H\} \, \bigr) = 2\]

Random variables and events

Random variables are naturally used to describe events of interest.
For example $\Bigl\{ X > 3 \Bigr\}$ or $\Bigl\{ Y \le 2 \Bigr\}$
Formally, this notation means:

\[\Bigl\{ X > 3 \Bigr\} \, = \, \Bigl\{ \omega \in \Omega \, : \, X \left( \omega \right) > 3 \Bigr\} \subseteq \Omega\]

and

\[\Bigl\{ Y \le 2 \Bigr\} \, = \, \Bigl\{ \omega \in \Omega \, : \, Y \left( \omega \right) \le 2 \Bigr\} \subseteq \Omega\]

In other words, $\Bigl\{ X > 3 \Bigr\}$ and $\Bigl\{ Y \le 2 \Bigr\}$ are events.

The collection of events for which our probability can be calculated is denoted as ${\cal B}$.

Technical digression

${\cal B}$ need not be $2^{\Omega}$ (technical issues arise)
We will assume that ${\cal B}$ is the smallest set of events that includes all events of the form $\left\{ X \leq x \right\}$, and that is closed under complements and (countable) unions.
This ensures that we can calculate

\[\mathbb{P}(X \leq 3) \qquad \mathbb{P}(X > \pi)\]

etc.
Interested in more details? Check MATH 420

Range/support of random variables

The range of $X$ is the set of all possible values for $X$, call it “${\cal R}$”.
$X =$ # defective items in a collection of size $N$:

\[{\cal R} \, = \, \mbox{Range of } X \, = \, \left\{ 0, 1, \ldots, N \right\} \subset \mathbb{N}\]
$Y =$ # website visits in a day:

\[{\cal R} \, = \, \mbox{Range of } Y \, = \, \left\{ 0, 1, \ldots, \right\} \, = \, \mathbb{N}.\]
$Z$ = Waiting time until next earthquake:

\[{\cal R} \, = \, \mbox{Range of } Z \, = \, \left[ 0, \infty \right) \, \subset \, \mathbb{R}.\]

Discrete random variables

A random variable $X$ is discrete if
- its range ${\cal R}$ is finite
  - ${\cal R} \, = \, \left\{ 1, 2, \ldots, N \right\}$
  - ${\cal R} \, = \, \left\{ 0, 1 \right\}$
- its range is countable
  - ${\cal R} \, = \, \mathbb{N}$
  - ${\cal R} \, = \, \mathbb{Z}.$
Roughly, a random variable $X$ is continuous if its range ${\cal R}$ is an interval such as
- ${\cal R} \, = \, \left[0, 1 \right],$
- ${\cal R} \, = \, \left( -2 , \infty \right)$
- ${\cal R} \, = \, \left( -\infty , \infty \right) \, = \, {\mathbb{R}}$
A more precise definition will be given below.

PMF and CDF of a discrete random variables

PMF: When $X$ is a discrete random variable $X$, its probability mass function is \[f_X(x) \, = \, \mathbb{P}\left( X = x \right) \, , \qquad \mbox{ for } x \in {\mathbb{R}}.\] Note that \[f_X(x) = 0 \quad \text{ if } \quad x \notin {\cal R}\]

CDF: The cumulative distribution function of a random variable $X$ is \[F_X(x) \, = \, \mathbb{P}\left( X \le x \right) \, , \qquad \mbox{ for } x \in {\mathbb{R}}\]

Example

Consider an experiment of tossing a coin 3 times.
Let $X =$ # of Tails. Then ${\cal R} \, = \, \left\{ 0, 1, 2, 3 \right\}$

PMF

$x$	$f_X(x)$
0	1/8
1	3/8
2	3/8
3	1/8
else	0

CDF

$x$	$F_X(x)$
0	1/8
1	4/8
2	7/8
3	1

Example (cont)

NOTE:
- $f_X(0.3) = \mathbb{P}(X = 0.3) = 0$
- $F_X(-1) = \mathbb{P}(X \leq -1) = 0$
- $F_X(1.3) = \mathbb{P}(X \leq 1.3) = 4/8 = 1/2$.

Since $f_X(x) = 0$ for $x \notin {\cal R}$ we only need to list its values on ${\cal R}$
Although $F_X(x)$ is defined for all $x \in {\mathbb{R}}$, note that $F_X(a)$ remains constant for values of $a$ “between” two consecutive elements of ${\cal R}$, so we do not really need to list $F_X(x)$ for all $x \in {\mathbb{R}}$ (which would be impossible, of course)

Properties of CDF

Let $F_W(x)$ be the CDF of a random variable $W$.

We have

$\lim_{x \to -\infty} F_W(x) = 0$ and $\lim_{x \to \infty} F_W(x) = 1$
$F_W(x)$ is non-decreasing
$F_W(x)$ is right continuous: \[\lim_{x \, \downarrow \, a} F_W(x) = F_W(a) \qquad \text{for all } a \in {\mathbb{R}}\]

Properties of PMF

Suppose $f_X(x)$ is a PMF
Then $f_X(x) \geq 0$ and $\sum_{x \in {\cal R}} f_X(x) = 1$.
Often, discrete random variables take integer values: ${\cal R} \subset \mathbb{N}$
In this case, $f_X(k) = F_X(k) - F_X(k-1)$ for any $k \in \mathbb{N}$, $f_X(x) = 0$ for $x \notin \mathbb{N}$.

\[\begin{align*} f_X(k) &= \mathbb{P}\left( X = k \right) = \mathbb{P}\left( k-1 < X \le k \right) \\ & \\ &= \mathbb{P}\left( (X \le k) \cap (X > k-1) \right) = \mathbb{P}\left( (X \le k) \cap (X \le k-1)^c \right) \\ & \\ &= \mathbb{P}\left( X \le k \right) - \mathbb{P}\left( (X \le k) \cap (X \le k-1) \right) \\ & \\ &= \mathbb{P}\left( X \le k \right) - \mathbb{P}\left( X \le k-1 \right) = F_X(k) - F_X(k-1) \end{align*}\]

Example: rolling two dice

Consider the experiment of rolling two fair dice.
Let $X$ be their sum
$X$ is a discrete random variable, ${\cal R} = \{2, 3, \ldots, 12\}$.
Its PMF and CDF are:

$x$	2	3	4	5	6	7	8	9	10	11	12
$f_X(x)$	1/36	2/36	3/36	4/36	5/36	6/36	5/36	4/36	3/36	2/36	1/36
$F_X(x)$	1/36	3/36	6/36	10/36	15/36	21/36	26/36	30/36	33/36	35/36	1

$f_X(x) = 0$ for all other $x\in{\mathbb{R}}$.
What about $F_X(x)$?

Table or algebraic expression?

This pmf $f_X(x)$ can also be written as \[f_X(x) = \frac{6 - |7-x|}{36}\] for $x \in {\cal R}$, $f_X(x)=0$ otherwise.
Both approaches (table & closed-form formula) are equally useful: you can calculate $f_X(x)$ and $F_X(x)$ for any $x \in \mathbb{R}$.
It is very important to specify ${\cal R}$ whenever you write the PDF or CDF.

Another example of PMF and CDF

For a fixed number $p \in [0, 1]$, define a discrete random variable $X$ with PMF and CDF given by:

PMF

\[ f\left( x;\ p\right) =\begin{cases} \left( 1-p\right) ^{2} & x=0, \\ 2p\left( 1-p\right) & x=1, \\ p^{2} & x=2, \\ 0 & \mbox{else.} \end{cases} \]

CDF

\[ F\left( x;\ p\right) = \begin{cases} 0 & x<0, \\ \left( 1-p\right) ^{2} & 0\leq x<1, \\ 1-p^{2} & 1\leq x<2, \\ 1 & x\geq 2. \end{cases} \]

Another example of PMF and CDF

Different values of $p$ will give different PMFs and CDFs

For example, if $p = 0.10$

PMF

\[ f\left( x;\ p\right) =\begin{cases} 0.81 & x=0, \\ 0.18 & x=1, \\ 0.01 & x=2 \\ 0 & \mbox{else.} \end{cases} \]

CDF

\[ F\left( x;\ p\right) = \begin{cases} 0 & x<0 \\ 0.81 & 0\leq x<1 \\ 0.99 & 1\leq x<2 \\ 1 & x\geq 2 \end{cases} \]

By including a parameter $p$, we are able to express many distributions (a family of distributions) with a single functional form.

A simple example

Consider an experiment consisting of flipping 5 coins
The sample space of this experiment is

\[\Omega \, = \, \left\{ \left( x_{1},x_{2},...,x_{5}\right) \, : \, x_{i} \in \left\{ H,T \right\} \, \right\}\]

Then $\# \Omega = 2^5 = 32$. We will assume that all outcomes ($\omega \in \Omega$) are equally likely
We are interested in the number of Tails we obtain in our toss
Define a random variable

\[Y = \left\{ \mbox{ number of Tails } \right\}\]
For example $\left\{ Y = 3 \right\}$ is

\[ \left\{ Y = 3 \right\} \, = \, \left\{ \text{ there are exactly 3 Tails in the 5 tosses } \right\} \]

CDF and PMF of a random variable

Using what we have learned so far, you can prove that

\[f_Y(3) = \mathbb{P}(Y=3) = {5 \choose 3} (1/2)^5.\]
Even more, for any $k = 0, 1, 2, 3, 4$ and $5$ we have

\[f_Y(k) = \mathbb{P}(Y= k ) = {5 \choose k} (1/2)^5\]

Finally

\[f_Y(k) = \mathbb{P}(Y = k) = 0 \qquad \text{ if } \quad k \notin \{ 0, 1, 2, 3, 4, 5 \}\]

Example

We can check that indeed

\[\sum_{x \in {\cal R} } f_Y(x) = 1\]

specifically

\[\sum_{x=0}^{5} f_Y ( x ) = \sum_{b=0}^{5} { 5 \choose b} \left( \frac{1}{2}\right) ^{5} =1\]

Note that we used $x$ and $b$ to list possible values of $Y$.

One can use any symbol for the auxiliary variables listing the values that $Y$ can take

Urn example

An urn contains $n$ numbered balls $(1, 2, \ldots, n)$.
Randomly draw $k$ balls $(1<k<n)$ without replacement.
Let $Y$ represent the largest number among them.

What is $\mathcal{R}$, the range of $Y$?
Find $F_{Y}\left(y\right)$ for all $y\in \mathcal{R}$
Find $f_{Y}\left(y\right)$ for all $y\in \mathcal{R}$

What is the range of $Y$?

The smallest possible value of $Y$ is $k.$
Corresponds to the event that the $k$ drawn numbers are $\left\{ 1,2,...,k\right\}$
The largest possible value for $Y$ is $n$.
Therefore the range of $Y$ is \[\mathcal{R}=\left\{ k,k+1,...,n\right\}\]

Find $F_Y(y)$

Find $f_Y (y)$

Properties

Properties of discrete PMF’s

$f_X(x) \ge 0$
$\sum_{x\in \mathcal{R}} f_X(x) = 1$, where $\mathcal{R} =$ range of $X$
$\mathbb{P}\left( X \in A \right) =\sum_{x\in A} f_X(x)$, where $A\subset \mathbb{R}$

Properties of discrete CDF’s

$0 \le F_X(x) \le 1$
$F\left( x \right)$ is non-decreasing and right-continuous
$\lim_{a \to -\infty} F_X\left( a \right) =0$, $\lim_{a \to +\infty} F_X\left( a \right) =1$
$\mathbb{P}\left( a<X\leq b\right) =F\left( b\right) -F\left( a\right)$ for any $a<b$
$f\left( k\right) =F\left( k\right) -F\left( k-1\right)$ for all $k \in {\cal R}$

Expected values

Suppose $X$ is a discrete random variable with PMF $f(x)$. Then, its expected value is defined as

\[\mathbb{E}[X] \ = \ \sum_{k \in \mathcal{R}} k \, P ( X = k ) \, = \, \sum_{k \in \mathcal{R}} k \, f_X(k)\]
Let $g : \mathcal{R} \rightarrow \mathcal{D}$ be a function, and let $Y = g(X)$
For example, $g(x) = \sin(x)$ or $g(x) = (x - 3)^2$.
Then $Y = g(X)$ is itself a function on $\Omega$:

\[ Y : \Omega \to {\cal D} \qquad \text{ with } \qquad Y( \omega ) = g \left( X ( \omega ) \right) \] and hence $Y$ is a discrete random variable, with its own pdf, $f_Y$, say.

Expected values

To compute $\mathbb{E}(Y)$ we should calculate

\[ \mathbb{E}(Y) \, = \, \sum_{z \in {\cal D}} z \, f_Y(z) \, = \, \sum_{z \in {\cal D} } z \, \mathbb{P}( Y = z) \, = \, \sum_{z \in {\cal D} } z \, \mathbb{P}( g(X) = z)\]

We have the following result:

\[\mathbb{E}[g(X)] = \sum_{k \in \mathcal{R}} g(k) \, f_X(k)\]

In other words, we do not need to compute $f_Y$ to calculate $\mathbb{E}(Y)$

The proof is beyond the scope of this class.

Calculate expectation in a simple situation

Let $X$ be a random variable with PMF given by

$k$	$f_X \left( k \right)$
0	0.15
1	0.25
2	0.30
3	0.20
4	0.10
else	0

Calculate $\mathbb{E}[ X]$
Calculate $\mathbb{E}[X^2]$

Solution

\[\mathbb{E}\left[ X \right] = \sum_{k=0}^4 k \, f_X(k) = 0 \times 0.15 + 1 \times 0.25 + 2 \times 0.30 + 3 \times 0.20 + 4\times 0.10 = 1.85.\]

\[\begin{aligned}\mathbb{E}\left[ X^{2}\right] &=\sum_{x=0}^{4} x^{2} \, f_X \left( x\right) \\ &= 0 \times 0.15 + 1 \times 0.25 + 4 \times 0.30 + 9 \times 0.20 + 16 \times 0.10 \\ &=4.85.\end{aligned}\]

Properties and notation

We often denote the expectation of $X$ by $\mu$, or $\mu_X$, or $\mathbb{E}\left[ X \right]$.
Expectation is a linear operator: for any $a, b \in \mathbb{R}$, we have

\[\mathbb{E}[ a \, X + b ] \ = \ a \, \mathbb{E}[ X ] + b\]
Key point: $a$ and $b$ are fixed numbers, not random. For example:

\[\mathbb{E}[ 3.2 \, X - 2.5 ] \ = \ 3.2 \, \mathbb{E}[ X ] - 2.5\]

Proof of the linearity of $\mathbb{E}$

Let $X$ be a random variable and $a$ and $b$ be fixed real numbers.

Then, using the result above for functions $g(X)$, using this particular function $g(X) = a \, X + b$ we have

\[\begin{align*} \mathbb{E}[ a \, X + b ] &= \sum_{k \in \mathcal{R}} \left( a \, k + b \right) \, f_X( k ) \\ &= \sum_{k \in \mathcal{R}} a \, k \, f_X ( k ) \, + \, \sum_{k \in \mathcal{R}} b \, f_X( k ) \, = \, \\ &= a \, \sum_{k \in \mathcal{R}} \, k \, f_X( k ) \, + \, b \, \sum_{k \in \mathcal{R}} \, f_X ( k )\\ &= \, a \, E [ X ] + b. \end{align*}\]

Optimal prediction

Let $f(k)$ be the pmf of a random variable $X$.
For each $t \in \mathbb{R}$, consider the average squared distance from $X$ to $t$:

\[m( t) = \mathbb{E}[ ( X-t) ^{2}] = \sum_{k \in {\cal R}} \, (k - t )^2 \, f_X ( k )\]
What is the value of $t \in {\mathbb{R}}$ that is closest to $X$ on average?
What is the value of $t \in \mathbb{R}$ that minimizes $m ( t )$?

Optimal prediction

We can find the minimum point by finding $t_0 \in \mathbb{R}$ that solves

\[\begin{align*} \left. {\frac{{\partial}^{} m(t) }{\partial{ t }^{}}} \right|_{t = t_0} &= \left. \left[ - 2 \sum_{x} (x-t) f_X(x) \right] \right|_{t = t_0} \overset{set}{=} 0 \\ &\Leftrightarrow \overset{ \mathbb{E}[X] }{\overbrace{\sum_{x} x \, f_X(x) }} = t_0 \ \overset{1}{\overbrace{\sum_{x} f_X(x) }} \\ &\Rightarrow t_0 = \mathbb{E}[X] \end{align*}\]

Since $m^{\prime \prime } (t_0) = 2 > 0$, then $t_0$ is the unique minimum of the function $m$:

\[\mathbb{E}\left[ ( X - \mathbb{E}[X] )^2 \right] \, \le \, \mathbb{E}\left[ ( X-t )^2 \right] \quad \mbox{ for all } t \in \mathbb{R}\]

This result shows that $\mathbb{E}[X]$ is the value that minimizes the mean squared prediction error for $X$.

An alternative proof

For any $t \in \mathbb{R}$

\[\begin{align*} (X - t)^2 &= (X - \mathbb{E}[X] + \mathbb{E}[X] - t)^2 = \left( (X - \mathbb{E}[X]) + (\mathbb{E}[X] - t) \right)^2 \\ & \\ &= (X - \mathbb{E}[X])^2 + (\mathbb{E}[X] - t)^2 + 2\, (X - \mathbb{E}[X]) \times (\mathbb{E}[X] - t) \\ \end{align*}\]

Thus

\[\begin{align*} \mathbb{E}(X - t)^2 &= \mathbb{E}\left[ (X - \mathbb{E}[X])^2 \right] + \mathbb{E}\left[ (\mathbb{E}[X] - t)^2 \right] + \mathbb{E}\left[ 2\, (X - \mathbb{E}[X]) \times (\mathbb{E}[X] - t) \right] \\ & \\ &= \mathbb{E}\left[ (X - \mathbb{E}[X])^2 \right] + (\mathbb{E}[X] - t)^2 \ge \mathbb{E}\left[ (X - \mathbb{E}[X])^2 \right] \end{align*}\]

because $\mathbb{E}[X] \in {\mathbb{R}}$, $(\mathbb{E}[X] - t) \in {\mathbb{R}}$, and $\mathbb{E}\left[ X - \mathbb{E}[X] \right] = 0$, thus \[ \mathbb{E}\left[ 2\, (X - \mathbb{E}[X]) \times (\mathbb{E}[X] - t) \right] = 2 \, (\mathbb{E}[X] - t) \, \mathbb{E}\left[ X - \mathbb{E}[X] \right] = 2 \, (\mathbb{E}[X] - t) \, 0 = 0 \]

An alternative proof

Hence, for any $t \in {\mathbb{R}}$

\[ \mathbb{E}(X - t)^2 \ge \mathbb{E}\left[ (X - \mathbb{E}[X])^2 \right] \] so $t_0 = \mathbb{E}[X]$ clearly minimizes the left-hand side.

Common misunderstanding of “expectation”

If $X$ is the output of rolling a fair die, \[\mu_X = \mathbb{E}[X] = (1/6) \{ 1 + 2 + \cdots + 6\} = 3.5.\]
But 3.5 can never be observed when you roll a die. $\mathbb{P}(X=3.5) = 0$.
Moreover, every time you roll it, the value $X$ takes will be different (and never 3.5)
Yet, it is still meaningful to examine, on average, how far any $X=x$ is from its expected value $\mu_x$.

Variance

Let $X$ be a random variable with expected value (or mean) $\mathbb{E}[X]$ or $\mu_x$.
We call \[\mbox{Var}[X] = \mathbb{E}\left[ \left(X - \mathbb{E}[X]\right) ^2 \right]\] the variance of $X$ if the expectation of the squared term exists
It is usually denoted by $\sigma^2_X$, or $\sigma^2$

Variance

The variance is the expected squared distance from the random variable $X$ to its mean $\mu_x$.
Suppose $X$ is the weight of a newborn baby in grams.
Its expectation is the mean weight of newborn babies in grams.
The variance of $X$, $\sigma^2_X$, is the squared deviation of newborn babies from their mean.
Its units are “squared grams”.

Standard deviation

The variance is the expected squared prediction error if we predict $X$ by its mean $\mu_X = \mathbb{E}[X]$
The standard deviation is the square root of the variance

Example

If $X$ = height in meters, then

\[\mathbb{E}\left[ X \right] \, = \, 5 m \quad \mbox{ say, } \quad \operatorname{Var}\left[ X \right] \, = 4 m^2\] so

\[\sigma_X = \text{SD}(X) = \sqrt{ \operatorname{Var}\left[ X \right] } \, = \, 2 m\]

Properties of variance

If $X$ is a random variable, and $a, b \in \mathbb{R}$ are fixed constants, then \[\operatorname{Var}[ a \, X + b \,] \ = \ a^2 \, \operatorname{Var}[ X ].\]
Let $\mu_X = \mathbb{E}\left[ X \right]$ then \[\operatorname{Var}\left[ X \right] \, = \, \mathbb{E}\left[ X^2 \right] - \mu_X^2\]
It is often easier to calculate $\mathbb{E}\left[ X^2 \right]$ than $\mathbb{E}[ \left( X - \mu_X \right)^2 ]$

Proof of $\operatorname{Var}[a X + b] = a^2\operatorname{Var}[ X]$.

Proof of $\operatorname{Var}[ X ] = \mathbb{E}[X^2] - \mathbb{E}[X]^2$

Computation Example

Suppose $X$ has PMF

\[f\left( k\right) = \begin{cases} 0.3414 \times \frac{1}{k} & k = 1, 2, \ldots, 10\\ 0 & \text{else}.\end{cases}\]

Its expectation is given by

\[\mu = 0.3414 \sum_{k=1}^{10} k \frac{1}{k} = 0.3414 \times 10 = 3.414\]

Similarly: $E\left( X^{2}\right) = 18.777$
Thus, its variance is given by \[\operatorname{Var}\left[ X \right] = 18.777 - 3.414^2 = 7.122\]

PDF given by table

$x$	$f(x)$	$xf(x)$	$x^2 f(x)$
0	0.15	0.00	0.00
1	0.25	0.25	0.25
2	0.30	0.60	1.20
3	0.20	0.60	1.80
4	0.10	0.40	1.60
else	0	0	0

Find $\mathbb{E}[X]$ and $\operatorname{Var}[X]$ and $\text{SD}[X]$.

Return of the Urn

An urn contains $n$ numbered balls $(1, 2, \ldots, n)$.
Randomly draw $k$ balls $(1<k<n)$ without replacement.
Let $Y$ represent the largest number among them.
Calculate the mean and variance of $Y$ when $n = 10$ and $k = 5$.
Use the computer to find these values.

Bernoulli trials

Consider an event $A$ such as

\[A \, = \, \left\{ \mbox{Coin lands ``H''} \right\}\]

or

\[A \, = \, \left\{ \mbox{Max annual wind speed > 100 km/h}\right\}.\]
Every toss or every year, we check whether $A$ occurred.
These checks are called trials

Bernoulli trials

Occurrences of $A$ are arbitrarily called “successes”
Suppose $\mathbb{P}\left( A\right)$ remains constant across trials

\[\mathbb{P}\left( A \right) = \mathbb{P}( \text{ a success } ) = p \in [0, 1]\]

and that the outcomes of trials are independent
We call such trials Bernoulli trials.

Three necessary conditions:

Binary outcomes
Constant rate of success
Independence across trials

Bernoulli random variables

\[Y_i = \begin{cases} 1 & \text{if $A$ occurs in the $i^{th}$ trial}\\ 0 & \text{if $A$ doesn't occur in the $i^{th}$ trial.}\end{cases} \]

The PMF of $Y_{i}$ is

\[\begin{aligned}f_{Y_i}\left( a\right) &= \begin{cases} p & \text{ if } \ a=1 \\ & \\ 1-p & \text{ if } \ a=0\\ & \\ 0 & \text{otherwise}\end{cases} \\ & \\ & \\ f_{Y_i}\left( a\right) & = p^{a}( 1-p)^{1-a} \quad \text{ for } a \in\{0,1\}\end{aligned} \]

Bernoulli random variables

Notation: $Y_{i} \sim {\mathrm{Bern}}\left( p\right)$
The mean of $Y_i$ is

\[\mathbb{E}[ Y_{i} ] =0 \times f_{Y_i}(0) +1 \times f_{Y_i}(1) = p\]

The variance of $Y_i$ is

\[\operatorname{Var}[ Y_i ] = \mathbb{E}[ Y_i - \mu_{Y_i}]^2 = \mathbb{E}[Y_i^2]- \mathbb{E}[Y_i]^2 = p - p^2 = p (1-p)\] because $\mathbb{E}[Y_i^2] = 0^2 \times f_{Y_i}(0) + 1^2 \times f_{Y_i}(1) = 0 \times (1 - p) + 1 \times p = p$

Binomial random variables

Consider an experiment consisting of a fixed number of Bernoulli trials ($n \ge 1$).
Let $X$ be the number of successes out of the $n$ trials.
Let $p = \mathbb{P}( \mbox{ success in any trial } )$.
Convention: we say that $X$ is binomially distributed, has binomial distribution.
We denote

\[X \, \thicksim \, {\mathrm{Binom}}\left( n, p\right)\]

where $n$ is the number of trials, and $p$ is the probability of success in each trial (which is the same for all trials).

Binomial random variables

The range of $X$ is $\left\{ 0, 1, \ldots, n \right\}$.
To calculate its PMF, consider the sample space

\[\Omega \, = \, \Bigl\{ \left(Y_1, Y_2, \ldots, Y_n \right), \ Y_j \in \left\{ S, F \right\} \Bigr\} \qquad \text{ and } \quad \# \Omega = 2^n \]

Independent trials imply that for any $\omega \in \Omega$ we have

\[\mathbb{P}\left( \left\{ \omega \right\} \right) = p^{n_S} (1-p)^{n - n_S} \qquad \omega \in \Omega\]

where $n_S$ = number of successes in $\omega$

For example

\[\mathbb{P}\Bigl( \left( S, S, F, F, \ldots, F, S \right) \Bigr) = p^3 (1-p)^{n - 3}\]

Binomial random variables

Hence, $f_X(k) = \mathbb{P}\left( X = k \right)$ is

\[f_X(k) \, = \, \mathbb{P}\left( \omega \in \Omega \ \mbox{ with exactly } k \ \mbox{successes} \right)\]

How many $\omega \in \Omega$ have exactly $k$ successes?

\[\text{There are } {n \choose k} \ \text{such } \omega's \qquad \text{ (Why? Prove it!)}\]

Hence,

\[f_X(k) \, = \begin{cases} \binom{n}{k} \, p^k \, \left( 1 - p \right)^{n-k} & \text{ if } k \in \{0,\dots,n\} \\ & \\ 0 & \text{otherwise} \end{cases}\]

Binomial theorem

Binomial Theorem: for any $a$, $b \in {\mathbb{R}}$ and $n \in \mathbb{N}$:

\[\left( a + b \right)^n \, = \, \sum_{k=0}^n \, \left( \begin{array}{c} n \\ k \end{array} \right) \, a^k \, b^{n - k} \]

Mean and variance

When $X \sim {\mathrm{Binom}}(n, p)$, we have \[\mathbb{E}[ X ] = \sum_{k=0}^n \, k {n \choose k} p^k (1-p)^{n-k} = n p.\] and \[\mathbb{E}[ X^2 ] = \sum_{k=0}^n \, k^2 {n \choose k} p^k (1-p) ^{n-k} = p^2 ( n-1 ) n + n p.\]
Therefore, \[\operatorname{Var}[X] = n p (1- p).\]

Expectation of a Binomial Random Variable

One approach to find expectation is as follows:

\[\begin{aligned} \mathbb{E}[X] &= \sum_{k=0}^n k {n \choose k} p^k (1-p)^{n-k} \\ &= \sum_{k=0}^n k \frac{n!}{k! (n-k)!} p^k (1-p)^{n-k} \\ &= n p \sum_{k=1}^n \frac{(n-1)!}{(k-1)! ((n-1) -(k-1))!} p^{k-1} (1-p)^{n-k} \\ &= n p \sum_{j=0}^m {m \choose j} p^{j} (1-p)^{m - j} \\ &= np \end{aligned}\]

Variance of a Binomial Random Variable

Start with \[\begin{aligned} \mathbb{E}[X(X-1)] &= \sum_{k=0}^n k(k-1) {n \choose k} p^k (1-p)^{n-k} \\ &= \sum_{k=0}^n k(k-1) \frac{n!}{k! (n-k)!} p^k (1-p)^{n-k} \\ &= n(n-1) p^2 \sum_{k=2}^n {n-2 \choose k-2} p^{k-2} (1-p)^{(n-2) -(k-2)}\\ &= n(n-1) p^2.\end{aligned} \]

Note that $X^2 = X + X(X-1)$.
Hence $\mathbb{E}[X^2] = np + n(n-1)p^2.$
Finally, $\operatorname{Var}[X] = \mathbb{E}[X^2] - \mathbb{E}[X]^2 = np + n(n-1)p^2 - n^2 p^2 = np(1-p).$

Attention shoppers!

Important

Both of the previous calculations used a trick.

Inside the sum, we made something look like a PMF
We know PMF’s sum to 1.
Sometimes called “kernel matching”
We will use similar ideas to calculate unpleasant integrals

Example of a Binomial random variable

Suppose that finding oil when digging in a certain region has probability $p = 0.10$ of success.
Assume that digging attempts (trials) are independent.
1. How many wells should be dug so that the probability of finding oil is at least 0.95?
2. How many wells should be dug so that the probability of finding at least 2 successful wells is at least 0.95?

Example (a)

Example (b)

Another discrete distribution - Geometric random variables

Consider a sequence of independent Bernoulli trials.
Constant probability of success: $p \in (0, 1)$ (the cases $p=0$ and $p=1$ require attention)
We repeat the Bernoulli trial until we find the first success
What is $\Omega$? What is $\# \Omega$? Are these outcomes equally likely?
Let $X$ be the number of trials needed to find the first success. What is the range of $X$ (${\cal R}$)?
Examples:

\[\Bigl\{ X = 3 \Bigr\} \, = \ \Bigl\{ \left( F, F, S \right) \Bigr\}\]

\[\Bigl\{ X = 6 \Bigr\} \, = \ \Bigl\{ \left( F, F, F, F, F, S \right) \Bigr\}\]

Note: there is only one $\omega \in \Omega$ for which $X(\omega) = 3$. How many $\omega \in \Omega$ satisfy $X(\omega) = k$, where $k \in \mathbb{N}$?

Geometric random variables

Because the trials are independent

\[\mathbb{P}\left( X = 3 \right) \, = \, \mathbb{P}\left( \left( F, F, S \right) \right) \, = \, (1-p)^2 \, p.\]
In general, we have

\[f_X( k) \, = \mathbb{P}( X = k) = \mathbb{P}( ( F, F, \ldots, F, S)) = \begin{cases} (1-p)^{k-1}p & k \in \{1, 2, 3, \ldots \}\\ & \\ 0 & \text{else}\end{cases}\]

Sanity check, for any $p \in (0, 1)$:

\[\sum_{k =1}^\infty f_K(k) = \sum_{k =1}^\infty (1-p)^{k-1} \, p = 1\]

Geometric random variables

Note that this sum means that the probability of eventually seeing a success $(X < \infty)$ is 1. The probability that we never (ever) see a success is $\mathbb{P}( (X < \infty)^c ) = 0$.
The CDF of $X$ for $k \in \mathbb{N}$ is

\[\begin{aligned} F_X( k ) & = \mathbb{P}( X \le k ) = \sum_{m=1}^k f_X(m) = \sum_{m=1}^k (1-p)^{m-1} \, p \\ & = p \, \sum_{m=1}^k (1-p)^{m-1} \\ & = p \, \left( \frac{ 1 - (1-p)^k }{ 1 - (1-p) } \right) \end{aligned} \]

Geometric random variables

Combining those results into one formula, the CDF of $X$ is:

\[ F_X(k) = \left\{ \begin{array}{ll} p \, \left( \frac{ 1 - (1-p)^k }{ 1 - (1-p) } \right) & \text{ if } k \in \mathbb{N} \\ & \\ 0 & \text{ otherwise} \end{array} \right. \]

Trick question: How much is $F(3.4)$?

More oil wells

Suppose that finding oil when digging in a certain region has probability $\theta = 0.10$ of success.
Assume that digging attempts (trials) are independent (what does this mean???)

What is the probability that we need to dig 5 wells before we find oil for the first time?
What is the probability that we find oil before digging the $30^{th}$ well?

More oil (solution)

Expectation and Variance of a Geometric RV

If $X \sim {\mathrm{Geom}}(p)$ with $p \in (0, 1)$, then:

\[\mathbb{E}\left[ X \right] = \frac{1}{p} \qquad \mbox{and} \qquad \operatorname{Var}\left[ X \right] = \frac{1-p}{p^2}\]

\[ \mathbb{E}\left[ X \right] = \sum_{k =1 }^\infty \ k \, f(k) = \sum_{k =1 }^\infty \ k \, (1-p)^{k-1} p = (-p) \, {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \sum_{k =1 }^\infty (1 - p)^k \right\} \]

Now note that for $|a| < 1$

\[\sum_{k=1}^\infty \, a^k \, = \, \frac{a}{1-a}\]

Expectation and Variance of a Geometric RV (cont’d)

Thus

\[ {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \sum_{k =1 }^\infty (1 - p)^k \right\} = {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \frac{1-p}{p} \right\} = -\frac{1}{p^2}\]

Finally

\[\mathbb{E}\left[ X \right] = (-p) \, {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \sum_{k =1 }^\infty (1 - p)^k \right\} = (-p) \, {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \frac{1-p}{p} \right\} = (-p) \times (-\frac{1}{p^2}) = \frac{1}{p}\]

Variance of a Geometric RV

To calculate $\operatorname{Var}\left[ X \right]$, first check that: $\mathbb{E}\left[ X \left( X - 1 \right) \right] \, = \, \mathbb{E}\left[ X^2 - X \right] \, = \, \mathbb{E}\left[ X^2 \right] - \mathbb{E}\left[ X \right]$

Expectation and Variance of a Geometric RV (cont’d)

Similarly to what we did for $\mathbb{E}[X]$

\[\mathbb{E}\left[ X \, \left( X - 1 \right) \right] \, = \, 2 \, \left( \frac{1-p}{p^2} \right)\]

Thus

\[\mathbb{E}\left[ X^2 \right] = \mathbb{E}\left[ X \, \left( X - 1 \right) \right] + \mathbb{E}[X] = 2 \, \left( \frac{1-p}{p^2} \right) + \frac{1}{p} \, = \, \frac{2-p}{p^2}\]

And finally,

\[\operatorname{Var}\left[ X \right] = E \left[ X^2 \right] - \left( \mathbb{E}[ X ] \right)^2 = E \left[ X^2 \right] - \frac{1}{p^2} = \frac{1-p}{p^2}\]

Example (job applicants)

Job applicants are randomly selected for an interview.
Assume that 10% of the applicants are suitable
1. Calculate the probability that it will take more than 10 interviews to fill the position
2. Each interview costs $ 50. Calculate the expected cost of the search and the standard deviation of the cost.
3. Suppose that 6 applicants have already been interviewed and the position is still open. What is the probability that it will take more than 10 additional interviews to fill the position?

Example (a)

Example (b)

Example (c)

Lack of memory

If $X \thicksim {\mathrm{Geom}}(p)$, $p \in (0,1)$, then

\[\mathbb{P}\left( X > k + m \ \vert\ X > k \right) = \mathbb{P}\left( X > m \right) \qquad k, m \in \mathbb{N}\]

In words: $X$ does not “remember” waiting the first $k$ trails
The proof is simple

\[\begin{aligned} \mathbb{P}( X > k + m \ \vert\ X > k ) &= \frac{ \mathbb{P}\left( X > k + m \right) }{ \mathbb{P}\left( X > k \right) } \\ &= \frac{ 1 - \mathbb{P}\left( X \le k + m \right) }{ 1 - \mathbb{P}\left( X \le k \right) } \\ &= \frac{ \left(1 - p \right)^{k + m} }{ \left( 1 - p \right)^k } = \left( 1 - p \right)^m \\ & = \mathbb{P}( X > m) \end{aligned}\]

Poisson random variables

Let us count the number of occurrences of a type of incident during a given period of time.
Examples:
- Number of earthquakes (mag. 5+) in a year;
- Car collisions at Oak and Cambie in a month.
Let $\lambda > 0$ be rate of occurrence of the incident:
- $\lambda = 7$ per year;
- $\lambda = 50$ per month.

Poisson random variables

Define $X = $ Number of incidents in a unit time interval.
The range of $X$ is the set $\{0, 1, 2, ...\}$
One choice for a PMF to model such a random variable is given by

\[f_X( k ) = \mathbb{P}( X = k ) = \begin{cases} \frac{ e^{-\lambda} \, \lambda^k }{k!} & k = 0, 1, 2, 3, \ldots \\ & \\ 0 &\text{else.}\end{cases}\]

The mathematics behind the derivation of this PMF will not be discussed in Stat 302.

Sanity check

We can check that $f_X$ defined above is indeed a proper PMF.

It satisfies $f_X(k) \ge 0$ for $k \in \mathbb{N} \cup \{ 0 \}$
Also:

\[\begin{aligned} \mathbb{P}(\Omega) &= \sum_{x=0}^{\infty }\frac{e^{-\lambda }\lambda ^{x}}{x!} \\ &=e^{-\lambda} \sum_{x=0}^{ \infty }\frac{\lambda ^{x}}{x!} \\ &= e^{-\lambda }\overset{\text{Taylor expansion of }e^{\lambda }}{\overbrace{\left[ 1+\lambda +\frac{% \lambda ^{2}}{2}+\frac{\lambda ^{3}}{3!}+\cdots \right] }} \\ &=e^{-\lambda }e^{\lambda }=1.\end{aligned}\]

Mean and variance of Poisson distribution

The mean and variance of a Poisson ($\lambda$) random variable are

\[\mathbb{E}[X] = \operatorname{Var}[X] = \lambda\]

Proof:

\[\begin{aligned} \mathbb{E}[X] &= \sum_{k=0}^\infty \, k \frac{ e^{-\lambda} \lambda^k}{k!} = \sum_{k=1}^\infty \, \frac{ e^{-\lambda} \lambda^k}{(k-1)!} = \lambda \, \sum_{k=1}^\infty \, \frac{ e^{-\lambda} \lambda^{k-1}}{(k-1)!} = \lambda \, \sum_{j=0}^\infty \, \frac{ e^{-\lambda} \lambda^{j}}{j!} = \lambda. \end{aligned}\]

For $\operatorname{Var}[X]$, first check that $\mathbb{E}[ X ( X - 1) ] = \lambda^2$, and then follow the same steps we used for the Geometric distribution.

Example earthquakes

$Y$ = # of earthquakes over 5.0 in magnitude in a given area
Assume $Y\thicksim {\mathrm{Poiss}}\left(\lambda \right)$ with $\lambda =3.6$ per year.

What is the probability of having at least 2 earthquakes over 5.0 in the next 6 months?
What is the probability of having 1 earthquake over 5.0 next month?
There were two earthquakes (over 5.0) in the first 6 months. What is the prob that we will have more that 4 this year?
What is the probability of waiting more than 3 months for the next earthquake over 5.0 in that area?

Simplifying assumptions

Let us assume for the moment that \[\begin{aligned} 1\text{ year } &=12\text{ months }=365\text{ days} \\ 1\text{ month } &=4\text{ weeks }=\text{ }30\text{ days} \\ \end{aligned}\]

These assumptions do not hold, but we will use them for illustration.

Implicit assumptions

The number of incidents in any period of time is random and has Poisson distribution.
The parameter of any Poisson distribution in this question is proportional to the length of the period.
The assumption is justifiable under certain conditions spelled out for Poisson process.
We will not discuss Poisson process. We will provide this information while working on this example.

\(x\)	2	3	4	5	6	7	8	9	10	11	12
\(f_X(x)\)	1/36	2/36	3/36	4/36	5/36	6/36	5/36	4/36	3/36	2/36	1/36
\(F_X(x)\)	1/36	3/36	6/36	10/36	15/36	21/36	26/36	30/36	33/36	35/36	1

Module 4

Matias Salibian Barrera

Random variables

Example

Random variables and events

Technical digression

Range/support of random variables

Discrete random variables

PMF and CDF of a discrete random variables

Example

Example (cont)

Properties of CDF

Properties of PMF

Example: rolling two dice

Table or algebraic expression?

Other probabilities related to \(X\)

Another example of PMF and CDF

Another example of PMF and CDF

A simple example

CDF and PMF of a random variable

Example

Urn example

What is the range of \(Y\)?

Find \(F_Y(y)\)

Find \(f_Y (y)\)

Properties

Properties of discrete PMF’s

Properties of discrete CDF’s

Expected values

Expected values

Calculate expectation in a simple situation

Solution

Properties and notation

Proof of the linearity of \(\mathbb{E}\)

Optimal prediction

Optimal prediction

An alternative proof

An alternative proof

Common misunderstanding of “expectation”

Variance

Variance

Standard deviation

Properties of variance

Proof of \(\operatorname{Var}[a X + b] = a^2\operatorname{Var}[ X]\).

Proof of \(\operatorname{Var}[ X ] = \mathbb{E}[X^2] - \mathbb{E}[X]^2\)

Computation Example

PDF given by table

Return of the Urn

Bernoulli trials

Bernoulli trials

Bernoulli random variables

Bernoulli random variables

Binomial random variables

Binomial random variables

Binomial random variables

Binomial theorem

Mean and variance

Expectation of a Binomial Random Variable

Variance of a Binomial Random Variable

Attention shoppers!

Example of a Binomial random variable

Example (a)

Example (b)

Another discrete distribution - Geometric random variables

Geometric random variables

Geometric random variables

Geometric random variables

More oil wells

More oil (solution)

Expectation and Variance of a Geometric RV

Expectation and Variance of a Geometric RV (cont’d)

Variance of a Geometric RV

Expectation and Variance of a Geometric RV (cont’d)

Example (job applicants)

Example (a)

Example (b)

Example (c)

Lack of memory

Poisson random variables

Poisson random variables