Last modified — 29 Sep 2025
A random variable is a function with domain the sample space and range a subset of real line:
\[X \, : \, \Omega \ \rightarrow {\mathbb{R}}\]
\[X( \omega ) \in {\mathbb{R}}\]
\[\Omega \, = \, \Bigl\{ ( x_{1}, x_{2}, \ldots, x_{5} ) \, : \ x_{i} \in \{ H, T \} \ \Bigr\}\]
\[\text{For example: } \qquad X \bigl( \, \{T, H, T, H, H\} \, \bigr) = 3\]
\[\text{For example: } \qquad Y \bigl( \, \{T, H, T, H, H\} \, \bigr) = 2\]
\[\Bigl\{ X > 3 \Bigr\} \, = \, \Bigl\{ \omega \in \Omega \, : \, X \left( \omega \right) > 3 \Bigr\} \subseteq \Omega\]
and
\[\Bigl\{ Y \le 2 \Bigr\} \, = \, \Bigl\{ \omega \in \Omega \, : \, Y \left( \omega \right) \le 2 \Bigr\} \subseteq \Omega\]
In other words, \(\Bigl\{ X > 3 \Bigr\}\) and \(\Bigl\{ Y \le 2 \Bigr\}\) are events.
The collection of events for which our probability can be calculated is denoted as \({\cal B}\).
\({\cal B}\) need not be \(2^{\Omega}\) (technical issues arise)
We will assume that \({\cal B}\) is the smallest set of events that includes all events of the form \(\left\{ X \leq x \right\}\), and that is closed under complements and (countable) unions.
This ensures that we can calculate
\[\mathbb{P}(X \leq 3) \qquad \mathbb{P}(X > \pi)\]
etc.
Interested in more details? Check MATH 420
The range of \(X\) is the set of all possible values for \(X\), call it “\({\cal R}\)”.
\(X =\) # defective items in a collection of size \(N\):
\[{\cal R} \, = \, \mbox{Range of } X \, = \, \left\{ 0, 1, \ldots, N \right\} \subset \mathbb{N}\]
\(Y =\) # website visits in a day:
\[{\cal R} \, = \, \mbox{Range of } Y \, = \, \left\{ 0, 1, \ldots, \right\} \, = \, \mathbb{N}.\]
\(Z\) = Waiting time until next earthquake:
\[{\cal R} \, = \, \mbox{Range of } Z \, = \, \left[ 0, \infty \right) \, \subset \, \mathbb{R}.\]
A random variable \(X\) is discrete if
its range \({\cal R}\) is finite
its range is countable
Roughly, a random variable \(X\) is continuous if its range \({\cal R}\) is an interval such as
A more precise definition will be given below.
PMF
\(x\) | \(f_X(x)\) |
---|---|
0 | 1/8 |
1 | 3/8 |
2 | 3/8 |
3 | 1/8 |
else | 0 |
CDF
\(x\) | \(F_X(x)\) |
---|---|
0 | 1/8 |
1 | 4/8 |
2 | 7/8 |
3 | 1 |
Since \(f_X(x) = 0\) for \(x \notin {\cal R}\) we only need to list its values on \({\cal R}\)
Although \(F_X(x)\) is defined for all \(x \in {\mathbb{R}}\), note that \(F_X(a)\) remains constant for values of \(a\) “between” two consecutive elements of \({\cal R}\), so we do not really need to list \(F_X(x)\) for all \(x \in {\mathbb{R}}\) (which would be impossible, of course)
Let \(F_W(x)\) be the CDF of a random variable \(W\).
We have
\[\begin{align*} f_X(k) &= \mathbb{P}\left( X = k \right) = \mathbb{P}\left( k-1 < X \le k \right) \\ & \\ &= \mathbb{P}\left( (X \le k) \cap (X > k-1) \right) = \mathbb{P}\left( (X \le k) \cap (X \le k-1)^c \right) \\ & \\ &= \mathbb{P}\left( X \le k \right) - \mathbb{P}\left( (X \le k) \cap (X \le k-1) \right) \\ & \\ &= \mathbb{P}\left( X \le k \right) - \mathbb{P}\left( X \le k-1 \right) = F_X(k) - F_X(k-1) \end{align*}\]
\(x\) | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|
\(f_X(x)\) | 1/36 | 2/36 | 3/36 | 4/36 | 5/36 | 6/36 | 5/36 | 4/36 | 3/36 | 2/36 | 1/36 |
\(F_X(x)\) | 1/36 | 3/36 | 6/36 | 10/36 | 15/36 | 21/36 | 26/36 | 30/36 | 33/36 | 35/36 | 1 |
This pmf \(f_X(x)\) can also be written as \[f_X(x) = \frac{6 - |7-x|}{36}\] for \(x \in {\cal R}\), \(f_X(x)=0\) otherwise.
Both approaches (table & closed-form formula) are equally useful: you can calculate \(f_X(x)\) and \(F_X(x)\) for any \(x \in \mathbb{R}\).
It is very important to specify \({\cal R}\) whenever you write the PDF or CDF.
The PMF and CDF each contain sufficient information to evaluate the probability of events that involve values of \(X\).
For example:
\[\begin{align*} \mathbb{P}\left( X \le 5.5 \right) &= F_X(5.5) = 10/36 \\ & \\ & \\ \mathbb{P}\left( 4 < X \le 9 \right) &= \mathbb{P}\left( (X \le 9) \cap (X > 4) \right) = \mathbb{P}\left( (X \le 9) \cap (X \le 4)^c \, \right) \\ & \\ &= \mathbb{P}\left( X \le 9 \right) - \mathbb{P}\left( X \le 4 \right) \qquad \text{ (why? prove it!) } \\ & \\ &= F_X(9) - F_X(4) = 30/36 - 6/36 = 24/36. \end{align*}\]
For a fixed number \(p \in [0, 1]\), define a discrete random variable \(X\) with PMF and CDF given by:
PMF
\[ f\left( x;\ p\right) =\begin{cases} \left( 1-p\right) ^{2} & x=0, \\ 2p\left( 1-p\right) & x=1, \\ p^{2} & x=2, \\ 0 & \mbox{else.} \end{cases} \]
CDF
\[ F\left( x;\ p\right) = \begin{cases} 0 & x<0, \\ \left( 1-p\right) ^{2} & 0\leq x<1, \\ 1-p^{2} & 1\leq x<2, \\ 1 & x\geq 2. \end{cases} \]
Different values of \(p\) will give different PMFs and CDFs
For example, if \(p = 0.10\)
PMF
\[ f\left( x;\ p\right) =\begin{cases} 0.81 & x=0, \\ 0.18 & x=1, \\ 0.01 & x=2 \\ 0 & \mbox{else.} \end{cases} \]
CDF
\[ F\left( x;\ p\right) = \begin{cases} 0 & x<0 \\ 0.81 & 0\leq x<1 \\ 0.99 & 1\leq x<2 \\ 1 & x\geq 2 \end{cases} \]
By including a parameter \(p\), we are able to express many distributions (a family of distributions) with a single functional form.
Consider an experiment consisting of flipping 5 coins
The sample space of this experiment is
\[\Omega \, = \, \left\{ \left( x_{1},x_{2},...,x_{5}\right) \, : \, x_{i} \in \left\{ H,T \right\} \, \right\}\]
Then \(\# \Omega = 2^5 = 32\). We will assume that all outcomes (\(\omega \in \Omega\)) are equally likely
We are interested in the number of Tails we obtain in our toss
Define a random variable
\[Y = \left\{ \mbox{ number of Tails } \right\}\]
For example \(\left\{ Y = 3 \right\}\) is
\[ \left\{ Y = 3 \right\} \, = \, \left\{ \text{ there are exactly 3 Tails in the 5 tosses } \right\} \]
Using what we have learned so far, you can prove that
\[f_Y(3) = \mathbb{P}(Y=3) = {5 \choose 3} (1/2)^5.\]
Even more, for any \(k = 0, 1, 2, 3, 4\) and \(5\) we have
\[f_Y(k) = \mathbb{P}(Y= k ) = {5 \choose k} (1/2)^5\]
\[f_Y(k) = \mathbb{P}(Y = k) = 0 \qquad \text{ if } \quad k \notin \{ 0, 1, 2, 3, 4, 5 \}\]
\[\sum_{x \in {\cal R} } f_Y(x) = 1\]
specifically
\[\sum_{x=0}^{5} f_Y ( x ) = \sum_{b=0}^{5} { 5 \choose b} \left( \frac{1}{2}\right) ^{5} =1\]
Note that we used \(x\) and \(b\) to list possible values of \(Y\).
One can use any symbol for the auxiliary variables listing the values that \(Y\) can take
The smallest possible value of \(Y\) is \(k.\)
Corresponds to the event that the \(k\) drawn numbers are \(\left\{ 1,2,...,k\right\}\)
The largest possible value for \(Y\) is \(n\).
Therefore the range of \(Y\) is \[\mathcal{R}=\left\{ k,k+1,...,n\right\}\]
\(f_X(x) \ge 0\)
\(\sum_{x\in \mathcal{R}} f_X(x) = 1\), where \(\mathcal{R} =\) range of \(X\)
\(\mathbb{P}\left( X \in A \right) =\sum_{x\in A} f_X(x)\), where \(A\subset \mathbb{R}\)
\(0 \le F_X(x) \le 1\)
\(F\left( x \right)\) is non-decreasing and right-continuous
\(\lim_{a \to -\infty} F_X\left( a \right) =0\), \(\lim_{a \to +\infty} F_X\left( a \right) =1\)
\(\mathbb{P}\left( a<X\leq b\right) =F\left( b\right) -F\left( a\right)\) for any \(a<b\)
\(f\left( k\right) =F\left( k\right) -F\left( k-1\right)\) for all \(k \in {\cal R}\)
Suppose \(X\) is a discrete random variable with PMF \(f(x)\). Then, its expected value is defined as
\[\mathbb{E}[X] \ = \ \sum_{k \in \mathcal{R}} k \, P ( X = k ) \, = \, \sum_{k \in \mathcal{R}} k \, f_X(k)\]
Let \(g : \mathcal{R} \rightarrow \mathcal{D}\) be a function, and let \(Y = g(X)\)
For example, \(g(x) = \sin(x)\) or \(g(x) = (x - 3)^2\).
Then \(Y = g(X)\) is itself a function on \(\Omega\):
\[ Y : \Omega \to {\cal D} \qquad \text{ with } \qquad Y( \omega ) = g \left( X ( \omega ) \right) \] and hence \(Y\) is a discrete random variable, with its own pdf, \(f_Y\), say.
\[ \mathbb{E}(Y) \, = \, \sum_{z \in {\cal D}} z \, f_Y(z) \, = \, \sum_{z \in {\cal D} } z \, \mathbb{P}( Y = z) \, = \, \sum_{z \in {\cal D} } z \, \mathbb{P}( g(X) = z)\]
\[\mathbb{E}[g(X)] = \sum_{k \in \mathcal{R}} g(k) \, f_X(k)\]
\(k\) | \(f_X \left( k \right)\) |
---|---|
0 | 0.15 |
1 | 0.25 |
2 | 0.30 |
3 | 0.20 |
4 | 0.10 |
else | 0 |
\[\mathbb{E}\left[ X \right] = \sum_{k=0}^4 k \, f_X(k) = 0 \times 0.15 + 1 \times 0.25 + 2 \times 0.30 + 3 \times 0.20 + 4\times 0.10 = 1.85.\]
\[\begin{aligned}\mathbb{E}\left[ X^{2}\right] &=\sum_{x=0}^{4} x^{2} \, f_X \left( x\right) \\ &= 0 \times 0.15 + 1 \times 0.25 + 4 \times 0.30 + 9 \times 0.20 + 16 \times 0.10 \\ &=4.85.\end{aligned}\]
We often denote the expectation of \(X\) by \(\mu\), or \(\mu_X\), or \(\mathbb{E}\left[ X \right]\).
Expectation is a linear operator: for any \(a, b \in \mathbb{R}\), we have
\[\mathbb{E}[ a \, X + b ] \ = \ a \, \mathbb{E}[ X ] + b\]
Key point: \(a\) and \(b\) are fixed numbers, not random. For example:
\[\mathbb{E}[ 3.2 \, X - 2.5 ] \ = \ 3.2 \, \mathbb{E}[ X ] - 2.5\]
Let \(X\) be a random variable and \(a\) and \(b\) be fixed real numbers.
Then, using the result above for functions \(g(X)\), using this particular function \(g(X) = a \, X + b\) we have
\[\begin{align*} \mathbb{E}[ a \, X + b ] &= \sum_{k \in \mathcal{R}} \left( a \, k + b \right) \, f_X( k ) \\ &= \sum_{k \in \mathcal{R}} a \, k \, f_X ( k ) \, + \, \sum_{k \in \mathcal{R}} b \, f_X( k ) \, = \, \\ &= a \, \sum_{k \in \mathcal{R}} \, k \, f_X( k ) \, + \, b \, \sum_{k \in \mathcal{R}} \, f_X ( k )\\ &= \, a \, E [ X ] + b. \end{align*}\]
Let \(f(k)\) be the pmf of a random variable \(X\).
For each \(t \in \mathbb{R}\), consider the average squared distance from \(X\) to \(t\):
\[m( t) = \mathbb{E}[ ( X-t) ^{2}] = \sum_{k \in {\cal R}} \, (k - t )^2 \, f_X ( k )\]
What is the value of \(t \in {\mathbb{R}}\) that is closest to \(X\) on average?
What is the value of \(t \in \mathbb{R}\) that minimizes \(m ( t )\)?
\[\begin{align*} \left. {\frac{{\partial}^{} m(t) }{\partial{ t }^{}}} \right|_{t = t_0} &= \left. \left[ - 2 \sum_{x} (x-t) f_X(x) \right] \right|_{t = t_0} \overset{set}{=} 0 \\ &\Leftrightarrow \overset{ \mathbb{E}[X] }{\overbrace{\sum_{x} x \, f_X(x) }} = t_0 \ \overset{1}{\overbrace{\sum_{x} f_X(x) }} \\ &\Rightarrow t_0 = \mathbb{E}[X] \end{align*}\]
Since \(m^{\prime \prime } (t_0) = 2 > 0\), then \(t_0\) is the unique minimum of the function \(m\):
\[\mathbb{E}\left[ ( X - \mathbb{E}[X] )^2 \right] \, \le \, \mathbb{E}\left[ ( X-t )^2 \right] \quad \mbox{ for all } t \in \mathbb{R}\]
This result shows that \(\mathbb{E}[X]\) is the value that minimizes the mean squared prediction error for \(X\).
For any \(t \in \mathbb{R}\)
\[\begin{align*} (X - t)^2 &= (X - \mathbb{E}[X] + \mathbb{E}[X] - t)^2 = \left( (X - \mathbb{E}[X]) + (\mathbb{E}[X] - t) \right)^2 \\ & \\ &= (X - \mathbb{E}[X])^2 + (\mathbb{E}[X] - t)^2 + 2\, (X - \mathbb{E}[X]) \times (\mathbb{E}[X] - t) \\ \end{align*}\]
Thus
\[\begin{align*} \mathbb{E}(X - t)^2 &= \mathbb{E}\left[ (X - \mathbb{E}[X])^2 \right] + \mathbb{E}\left[ (\mathbb{E}[X] - t)^2 \right] + \mathbb{E}\left[ 2\, (X - \mathbb{E}[X]) \times (\mathbb{E}[X] - t) \right] \\ & \\ &= \mathbb{E}\left[ (X - \mathbb{E}[X])^2 \right] + (\mathbb{E}[X] - t)^2 \ge \mathbb{E}\left[ (X - \mathbb{E}[X])^2 \right] \end{align*}\]
because \(\mathbb{E}[X] \in {\mathbb{R}}\), \((\mathbb{E}[X] - t) \in {\mathbb{R}}\), and \(\mathbb{E}\left[ X - \mathbb{E}[X] \right] = 0\), thus \[ \mathbb{E}\left[ 2\, (X - \mathbb{E}[X]) \times (\mathbb{E}[X] - t) \right] = 2 \, (\mathbb{E}[X] - t) \, \mathbb{E}\left[ X - \mathbb{E}[X] \right] = 2 \, (\mathbb{E}[X] - t) \, 0 = 0 \]
Hence, for any \(t \in {\mathbb{R}}\)
\[ \mathbb{E}(X - t)^2 \ge \mathbb{E}\left[ (X - \mathbb{E}[X])^2 \right] \] so \(t_0 = \mathbb{E}[X]\) clearly minimizes the left-hand side.
Let \(X\) be a random variable with expected value (or mean) \(\mathbb{E}[X]\) or \(\mu_x\).
We call \[\mbox{Var}[X] = \mathbb{E}\left[ \left(X - \mathbb{E}[X]\right) ^2 \right]\] the variance of \(X\) if the expectation of the squared term exists
It is usually denoted by \(\sigma^2_X\), or \(\sigma^2\)
The variance is the expected squared distance from the random variable \(X\) to its mean \(\mu_x\).
Suppose \(X\) is the weight of a newborn baby in grams.
Its expectation is the mean weight of newborn babies in grams.
The variance of \(X\), \(\sigma^2_X\), is the squared deviation of newborn babies from their mean.
Its units are “squared grams”.
The variance is the expected squared prediction error if we predict \(X\) by its mean \(\mu_X = \mathbb{E}[X]\)
The standard deviation is the square root of the variance
\[\mathbb{E}\left[ X \right] \, = \, 5 m \quad \mbox{ say, } \quad \operatorname{Var}\left[ X \right] \, = 4 m^2\] so
\[\sigma_X = \text{SD}(X) = \sqrt{ \operatorname{Var}\left[ X \right] } \, = \, 2 m\]
If \(X\) is a random variable, and \(a, b \in \mathbb{R}\) are fixed constants, then \[\operatorname{Var}[ a \, X + b \,] \ = \ a^2 \, \operatorname{Var}[ X ].\]
Let \(\mu_X = \mathbb{E}\left[ X \right]\) then \[\operatorname{Var}\left[ X \right] \, = \, \mathbb{E}\left[ X^2 \right] - \mu_X^2\]
It is often easier to calculate \(\mathbb{E}\left[ X^2 \right]\) than \(\mathbb{E}[ \left( X - \mu_X \right)^2 ]\)
\[f\left( k\right) = \begin{cases} 0.3414 \times \frac{1}{k} & k = 1, 2, \ldots, 10\\ 0 & \text{else}.\end{cases}\]
\[\mu = 0.3414 \sum_{k=1}^{10} k \frac{1}{k} = 0.3414 \times 10 = 3.414\]
Similarly: \(E\left( X^{2}\right) = 18.777\)
Thus, its variance is given by \[\operatorname{Var}\left[ X \right] = 18.777 - 3.414^2 = 7.122\]
\(x\) | \(f(x)\) | \(xf(x)\) | \(x^2 f(x)\) |
---|---|---|---|
0 | 0.15 | 0.00 | 0.00 |
1 | 0.25 | 0.25 | 0.25 |
2 | 0.30 | 0.60 | 1.20 |
3 | 0.20 | 0.60 | 1.80 |
4 | 0.10 | 0.40 | 1.60 |
else | 0 | 0 | 0 |
An urn contains \(n\) numbered balls \((1, 2, \ldots, n)\).
Randomly draw \(k\) balls \((1<k<n)\) without replacement.
Let \(Y\) represent the largest number among them.
Calculate the mean and variance of \(Y\) when \(n = 10\) and \(k = 5\).
Use the computer to find these values.
Consider an event \(A\) such as
\[A \, = \, \left\{ \mbox{Coin lands ``H''} \right\}\]
or
\[A \, = \, \left\{ \mbox{Max annual wind speed > 100 km/h}\right\}.\]
Every toss or every year, we check whether \(A\) occurred.
These checks are called trials
Occurrences of \(A\) are arbitrarily called “successes”
Suppose \(\mathbb{P}\left( A\right)\) remains constant across trials
\[\mathbb{P}\left( A \right) = \mathbb{P}( \text{ a success } ) = p \in [0, 1]\]
and that the outcomes of trials are independent
We call such trials Bernoulli trials.
Three necessary conditions:
\[Y_i = \begin{cases} 1 & \text{if $A$ occurs in the $i^{th}$ trial}\\ 0 & \text{if $A$ doesn't occur in the $i^{th}$ trial.}\end{cases} \]
\[\begin{aligned}f_{Y_i}\left( a\right) &= \begin{cases} p & \text{ if } \ a=1 \\ & \\ 1-p & \text{ if } \ a=0\\ & \\ 0 & \text{otherwise}\end{cases} \\ & \\ & \\ f_{Y_i}\left( a\right) & = p^{a}( 1-p)^{1-a} \quad \text{ for } a \in\{0,1\}\end{aligned} \]
Notation: \(Y_{i} \sim {\mathrm{Bern}}\left( p\right)\)
The mean of \(Y_i\) is
\[\mathbb{E}[ Y_{i} ] =0 \times f_{Y_i}(0) +1 \times f_{Y_i}(1) = p\]
\[\operatorname{Var}[ Y_i ] = \mathbb{E}[ Y_i - \mu_{Y_i}]^2 = \mathbb{E}[Y_i^2]- \mathbb{E}[Y_i]^2 = p - p^2 = p (1-p)\] because \(\mathbb{E}[Y_i^2] = 0^2 \times f_{Y_i}(0) + 1^2 \times f_{Y_i}(1) = 0 \times (1 - p) + 1 \times p = p\)
Consider an experiment consisting of a fixed number of Bernoulli trials (\(n \ge 1\)).
Let \(X\) be the number of successes out of the \(n\) trials.
Let \(p = \mathbb{P}( \mbox{ success in any trial } )\).
Convention: we say that \(X\) is binomially distributed, has binomial distribution.
We denote
\[X \, \thicksim \, {\mathrm{Binom}}\left( n, p\right)\]
where \(n\) is the number of trials, and \(p\) is the probability of success in each trial (which is the same for all trials).
The range of \(X\) is \(\left\{ 0, 1, \ldots, n \right\}\).
To calculate its PMF, consider the sample space
\[\Omega \, = \, \Bigl\{ \left(Y_1, Y_2, \ldots, Y_n \right), \ Y_j \in \left\{ S, F \right\} \Bigr\} \qquad \text{ and } \quad \# \Omega = 2^n \]
\[\mathbb{P}\left( \left\{ \omega \right\} \right) = p^{n_S} (1-p)^{n - n_S} \qquad \omega \in \Omega\]
where \(n_S\) = number of successes in \(\omega\)
\[\mathbb{P}\Bigl( \left( S, S, F, F, \ldots, F, S \right) \Bigr) = p^3 (1-p)^{n - 3}\]
\[f_X(k) \, = \, \mathbb{P}\left( \omega \in \Omega \ \mbox{ with exactly } k \ \mbox{successes} \right)\]
\[\text{There are } {n \choose k} \ \text{such } \omega's \qquad \text{ (Why? Prove it!)}\]
\[f_X(k) \, = \begin{cases} \binom{n}{k} \, p^k \, \left( 1 - p \right)^{n-k} & \text{ if } k \in \{0,\dots,n\} \\ & \\ 0 & \text{otherwise} \end{cases}\]
\[\left( a + b \right)^n \, = \, \sum_{k=0}^n \, \left( \begin{array}{c} n \\ k \end{array} \right) \, a^k \, b^{n - k} \]
When \(X \sim {\mathrm{Binom}}(n, p)\), we have \[\mathbb{E}[ X ] = \sum_{k=0}^n \, k {n \choose k} p^k (1-p)^{n-k} = n p.\] and \[\mathbb{E}[ X^2 ] = \sum_{k=0}^n \, k^2 {n \choose k} p^k (1-p) ^{n-k} = p^2 ( n-1 ) n + n p.\]
Therefore, \[\operatorname{Var}[X] = n p (1- p).\]
One approach to find expectation is as follows:
\[\begin{aligned} \mathbb{E}[X] &= \sum_{k=0}^n k {n \choose k} p^k (1-p)^{n-k} \\ &= \sum_{k=0}^n k \frac{n!}{k! (n-k)!} p^k (1-p)^{n-k} \\ &= n p \sum_{k=1}^n \frac{(n-1)!}{(k-1)! ((n-1) -(k-1))!} p^{k-1} (1-p)^{n-k} \\ &= n p \sum_{j=0}^m {m \choose j} p^{j} (1-p)^{m - j} \\ &= np \end{aligned}\]
Start with \[\begin{aligned} \mathbb{E}[X(X-1)] &= \sum_{k=0}^n k(k-1) {n \choose k} p^k (1-p)^{n-k} \\ &= \sum_{k=0}^n k(k-1) \frac{n!}{k! (n-k)!} p^k (1-p)^{n-k} \\ &= n(n-1) p^2 \sum_{k=2}^n {n-2 \choose k-2} p^{k-2} (1-p)^{(n-2) -(k-2)}\\ &= n(n-1) p^2.\end{aligned} \]
Important
Both of the previous calculations used a trick.
Suppose that finding oil when digging in a certain region has probability \(p = 0.10\) of success.
Assume that digging attempts (trials) are independent.
How many wells should be dug so that the probability of finding oil is at least 0.95?
How many wells should be dug so that the probability of finding at least 2 successful wells is at least 0.95?
Consider a sequence of independent Bernoulli trials.
Constant probability of success: \(p \in (0, 1)\) (the cases \(p=0\) and \(p=1\) require attention)
We repeat the Bernoulli trial until we find the first success
What is \(\Omega\)? What is \(\# \Omega\)? Are these outcomes equally likely?
Let \(X\) be the number of trials needed to find the first success. What is the range of \(X\) (\({\cal R}\))?
Examples:
\[\Bigl\{ X = 3 \Bigr\} \, = \ \Bigl\{ \left( F, F, S \right) \Bigr\}\]
\[\Bigl\{ X = 6 \Bigr\} \, = \ \Bigl\{ \left( F, F, F, F, F, S \right) \Bigr\}\]
Note: there is only one \(\omega \in \Omega\) for which \(X(\omega) = 3\). How many \(\omega \in \Omega\) satisfy \(X(\omega) = k\), where \(k \in \mathbb{N}\)?
Because the trials are independent
\[\mathbb{P}\left( X = 3 \right) \, = \, \mathbb{P}\left( \left( F, F, S \right) \right) \, = \, (1-p)^2 \, p.\]
In general, we have
\[f_X( k) \, = \mathbb{P}( X = k) = \mathbb{P}( ( F, F, \ldots, F, S)) = \begin{cases} (1-p)^{k-1}p & k \in \{1, 2, 3, \ldots \}\\ & \\ 0 & \text{else}\end{cases}\]
\[\sum_{k =1}^\infty f_K(k) = \sum_{k =1}^\infty (1-p)^{k-1} \, p = 1\]
Note that this sum means that the probability of eventually seeing a success \((X < \infty)\) is 1. The probability that we never (ever) see a success is \(\mathbb{P}( (X < \infty)^c ) = 0\).
The CDF of \(X\) for \(k \in \mathbb{N}\) is
\[\begin{aligned} F_X( k ) & = \mathbb{P}( X \le k ) = \sum_{m=1}^k f_X(m) = \sum_{m=1}^k (1-p)^{m-1} \, p \\ & = p \, \sum_{m=1}^k (1-p)^{m-1} \\ & = p \, \left( \frac{ 1 - (1-p)^k }{ 1 - (1-p) } \right) \end{aligned} \]
\[ F_X(k) = \left\{ \begin{array}{ll} p \, \left( \frac{ 1 - (1-p)^k }{ 1 - (1-p) } \right) & \text{ if } k \in \mathbb{N} \\ & \\ 0 & \text{ otherwise} \end{array} \right. \]
Suppose that finding oil when digging in a certain region has probability \(\theta = 0.10\) of success.
Assume that digging attempts (trials) are independent (what does this mean???)
What is the probability that we need to dig 5 wells before we find oil for the first time?
What is the probability that we find oil before digging the \(30^{th}\) well?
\[\mathbb{E}\left[ X \right] = \frac{1}{p} \qquad \mbox{and} \qquad \operatorname{Var}\left[ X \right] = \frac{1-p}{p^2}\]
\[ \mathbb{E}\left[ X \right] = \sum_{k =1 }^\infty \ k \, f(k) = \sum_{k =1 }^\infty \ k \, (1-p)^{k-1} p = (-p) \, {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \sum_{k =1 }^\infty (1 - p)^k \right\} \]
\[\sum_{k=1}^\infty \, a^k \, = \, \frac{a}{1-a}\]
\[ {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \sum_{k =1 }^\infty (1 - p)^k \right\} = {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \frac{1-p}{p} \right\} = -\frac{1}{p^2}\]
\[\mathbb{E}\left[ X \right] = (-p) \, {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \sum_{k =1 }^\infty (1 - p)^k \right\} = (-p) \, {\frac{{\partial}^{} }{\partial{p}^{}}} \left\{ \frac{1-p}{p} \right\} = (-p) \times (-\frac{1}{p^2}) = \frac{1}{p}\]
\[\mathbb{E}\left[ X \, \left( X - 1 \right) \right] \, = \, 2 \, \left( \frac{1-p}{p^2} \right)\]
\[\mathbb{E}\left[ X^2 \right] = \mathbb{E}\left[ X \, \left( X - 1 \right) \right] + \mathbb{E}[X] = 2 \, \left( \frac{1-p}{p^2} \right) + \frac{1}{p} \, = \, \frac{2-p}{p^2}\]
\[\operatorname{Var}\left[ X \right] = E \left[ X^2 \right] - \left( \mathbb{E}[ X ] \right)^2 = E \left[ X^2 \right] - \frac{1}{p^2} = \frac{1-p}{p^2}\]
Job applicants are randomly selected for an interview.
Assume that 10% of the applicants are suitable
Calculate the probability that it will take more than 10 interviews to fill the position
Each interview costs $ 50. Calculate the expected cost of the search and the standard deviation of the cost.
Suppose that 6 applicants have already been interviewed and the position is still open. What is the probability that it will take more than 10 additional interviews to fill the position?
\[\mathbb{P}\left( X > k + m \ \vert\ X > k \right) = \mathbb{P}\left( X > m \right) \qquad k, m \in \mathbb{N}\]
In words: \(X\) does not “remember” waiting the first \(k\) trails
The proof is simple
\[\begin{aligned} \mathbb{P}( X > k + m \ \vert\ X > k ) &= \frac{ \mathbb{P}\left( X > k + m \right) }{ \mathbb{P}\left( X > k \right) } \\ &= \frac{ 1 - \mathbb{P}\left( X \le k + m \right) }{ 1 - \mathbb{P}\left( X \le k \right) } \\ &= \frac{ \left(1 - p \right)^{k + m} }{ \left( 1 - p \right)^k } = \left( 1 - p \right)^m \\ & = \mathbb{P}( X > m) \end{aligned}\]
Let us count the number of occurrences of a type of incident during a given period of time.
Examples:
Number of earthquakes (mag. 5+) in a year;
Car collisions at Oak and Cambie in a month.
Let \(\lambda > 0\) be rate of occurrence of the incident:
\(\lambda = 7\) per year;
\(\lambda = 50\) per month.
Define $X = $ Number of incidents in a unit time interval.
The range of \(X\) is the set \(\{0, 1, 2, ...\}\)
One choice for a PMF to model such a random variable is given by
\[f_X( k ) = \mathbb{P}( X = k ) = \begin{cases} \frac{ e^{-\lambda} \, \lambda^k }{k!} & k = 0, 1, 2, 3, \ldots \\ & \\ 0 &\text{else.}\end{cases}\]
The mathematics behind the derivation of this PMF will not be discussed in Stat 302.
We can check that \(f_X\) defined above is indeed a proper PMF.
It satisfies \(f_X(k) \ge 0\) for \(k \in \mathbb{N} \cup \{ 0 \}\)
Also:
\[\begin{aligned} \mathbb{P}(\Omega) &= \sum_{x=0}^{\infty }\frac{e^{-\lambda }\lambda ^{x}}{x!} \\ &=e^{-\lambda} \sum_{x=0}^{ \infty }\frac{\lambda ^{x}}{x!} \\ &= e^{-\lambda }\overset{\text{Taylor expansion of }e^{\lambda }}{\overbrace{\left[ 1+\lambda +\frac{% \lambda ^{2}}{2}+\frac{\lambda ^{3}}{3!}+\cdots \right] }} \\ &=e^{-\lambda }e^{\lambda }=1.\end{aligned}\]
The mean and variance of a Poisson (\(\lambda\)) random variable are
\[\mathbb{E}[X] = \operatorname{Var}[X] = \lambda\]
Proof:
\[\begin{aligned} \mathbb{E}[X] &= \sum_{k=0}^\infty \, k \frac{ e^{-\lambda} \lambda^k}{k!} = \sum_{k=1}^\infty \, \frac{ e^{-\lambda} \lambda^k}{(k-1)!} = \lambda \, \sum_{k=1}^\infty \, \frac{ e^{-\lambda} \lambda^{k-1}}{(k-1)!} = \lambda \, \sum_{j=0}^\infty \, \frac{ e^{-\lambda} \lambda^{j}}{j!} = \lambda. \end{aligned}\]
For \(\operatorname{Var}[X]\), first check that \(\mathbb{E}[ X ( X - 1) ] = \lambda^2\), and then follow the same steps we used for the Geometric distribution.
\(Y\) = # of earthquakes over 5.0 in magnitude in a given area
Assume \(Y\thicksim {\mathrm{Poiss}}\left(\lambda \right)\) with \(\lambda =3.6\) per year.
What is the probability of having at least 2 earthquakes over 5.0 in the next 6 months?
What is the probability of having 1 earthquake over 5.0 next month?
There were two earthquakes (over 5.0) in the first 6 months. What is the prob that we will have more that 4 this year?
What is the probability of waiting more than 3 months for the next earthquake over 5.0 in that area?
Let us assume for the moment that \[\begin{aligned} 1\text{ year } &=12\text{ months }=365\text{ days} \\ 1\text{ month } &=4\text{ weeks }=\text{ }30\text{ days} \\ \end{aligned}\]
These assumptions do not hold, but we will use them for illustration.
The number of incidents in any period of time is random and has Poisson distribution.
The parameter of any Poisson distribution in this question is proportional to the length of the period.
The assumption is justifiable under certain conditions spelled out for Poisson process.
We will not discuss Poisson process. We will provide this information while working on this example.
Stat 302 - Winter 2025/26