Last modified — 05 Sep 2025
Total: min(55, Quizzes + WebWork + Homework)
Note
The largest barrier to doing well are: skipping class and office hours, and not doing coursework conscientiously.
Grade: min(10, Midterm)
An action undertaken to make a discovery, test a hypothesis, or confirm a known fact.
Release your pen from 4.9 meters above the ground.
The pen will fall to the ground.
It will take about 1 sec to reach the ground.
The pen did touch the ground
Less sure if it took exactly 1 second to do so
The outcome of some experiments cannot be determined beforehand.
Roll a die: which side will show?
Draw a card from a well-shuffled deck: which one you will get?
How many students will be in the classroom today?
Even though die rolls are random, patterns emerge when we repeat the experiment many times.
Probability Theory describes such patterns via mathematical models.
It is a branch of Mathematics, and is based on a set of Axioms.
Axioms: Statements or propositions accepted to hold true
Theorems: Propositions which are established to hold true using sound logical reasoning.
We denote it by \(\Omega\), and a generic outcome, also called sample point, by \(\omega\) (i.e. \(\omega \in \Omega\)).
Roll a die: \(\Omega = \{ 1,2,3,4,5, 6 \} \subset \mathbb{N}\).
Draw a card from a poker deck: \(\Omega = \{ 2\spadesuit, 2\diamondsuit, \ldots, A\clubsuit ,A \heartsuit \}\)
Wind speed at YVR (km/h): \(\Omega =[0, \infty) \subset \mathbb{R}\).
Wait time for R4 at UBC (min): \(\Omega =[0, 720) \subset \mathbb{R}\)
Notation: We commonly use upper case letters (\(A\), \(B\), \(C\), …) for events.
Events are sets:
\(\omega \in A\) means “\(\omega\) is an element of \(A\)”.
\(C \subset D\) means “\(C\) is a subset of \(D\)”.
Events are often formed by outcomes sharing some property. It’s a good idea to practice listing explicitly the sample points of events described with words.
Roll a dice:
Bus wait time: \(H = \text{"wait is less than half an hour"} = [15, 30]\)
Max-wind-speed: \(G = \text{"wind is over 80 km/hour"} = (80, \infty )\)
Suppose \(A\), \(B\) are events (subsets of \(\Omega\)).
Union: \(A \mathop{\mathrm{\mathchoice{\bigcup}{\cup}{\cup}{\cup}}}B\) \[\omega \in A \cup B \Leftrightarrow \omega \in A \mbox{ or } \omega \in B\]
Intersection: \(A \cap B\) \[\omega \in A \cap B \Leftrightarrow \omega \in A \mbox{ and } \omega \in B\]
Complement: \(A^c\) \[\omega \in A^c\Leftrightarrow \omega \notin A\]
Symmetric difference: \(A \, \triangle \, B\) \[A \, \triangle \, B \, = \, \left( A \cap B^c \right) \, \cup \, \left( A^c \cap B \right)\]
Equality
Commutative:
\(A \cup B \ = \ B \cup A\)
\(A \cap B \ = \ B \cap A\)
Associative:
\(A\cup B\cup C \, = \, \left( A\cup B\right) \cup C=A\cup \left( B\cup C\right)\)
\(A\cap B\cap C \, = \, \left( A\cap B\right) \cap C=A\cap \left( B\cap C\right)\)
Distributive:
\(\left( A\cup B\right) \cap C \, = \, \left( A\cap C\right) \cup \left( B\cap C\right)\)
\(\left( A\cap B\right) \cup C \, = \, \left( A\cup C\right) \cap \left( B\cup C\right)\)
To prove the theorem it is sufficient to show that \[ ( A\cup B )^{c} \subseteq A^{c}\cap B^{c} \] and that \[ A^{c}\cap B^{c} \subseteq ( A\cup B ) ^{c} \]
Take any \(\omega \in ( A\cup B )^c\), then we must have \(\omega \notin A \cup B\);
This implies that \(\omega \notin A\) and \(\omega \notin B\) (because if either \(\omega \in A\) or \(\omega \in B\) then we’d have \(\omega \in A \cup B\));
Hence: \(\omega \in A^c\) and \(\omega \in B^c\);
Which is the same as \(\omega \in A^c \cap B^c\);
Thus, we showed that \(\omega \in ( A\cup B )^c\) implies \(\omega \in A^c \cap B^c\). In other words, that \[( A\cup B ) ^{c} \, \subseteq \, A^{c}\cap B^{c} \, .\]
We leave it to you to prove that \(A^{c}\cap B^{c} \subseteq ( A\cup B ) ^{c}\).
The power set of \(\Omega\) (denoted \(2^\Omega\)) is the set of all possible subsets of \(\Omega\).
For example, if \(\Omega \ = \ \{ 1,2,3 \}\) then: \[2^{\Omega } \, = \Bigl\{ \varnothing , \{ 1\} , \{ 2 \} , \{ 3 \} , \{ 1,2 \} , \{ 1,3 \} , \{ 2,3 \}, \{ 1, 2, 3 \} \Bigr\}\]
The symbol \(\varnothing\) denotes the empty set.
If \(\Omega\) has \(n\) elements, then \(2^{\Omega }\) has \(2^{n}\) elements. In symbols: \[\# ( 2^{\Omega } ) \ = \ 2^{\#\Omega }\] How can we prove this when \(\#\Omega = n \in \mathbb{N}\)?
List the \(n\) elements of \(\Omega\) as \(\Omega =\left\{ \omega _{1},\omega _{2},...,\omega _{n}\right\}\).
Any event \(A \subseteq \Omega\) can be uniquely represented by a sequence of 0’s and 1’s: \[ x_i = \left\{ \begin{array}{ll} 1 & \text{ if } \omega_i \in A \, , \\ 0 & \text{ if } \omega_i \notin A \, . \end{array} \right. \quad 1 \le i \le n \, . \] To each event \(A\) we associate the string \((x_1, x_2, \ldots, x_n)\): \[ A \ \longrightarrow \ (x_1, x_2, \ldots, x_n) \] And, each sequence corresponds to an event: \((0, 1, 1, \ldots, 0, 1) \ \longrightarrow \ B\).
There are \(2^n\) distinct such \(n\)-long sequences. Each of them corresponds uniquely to one event, and every event corresponds to one such sequence. Hence there are the same number of events as there are sequences (\(2^n\)).
We will define what a probability is, as a mathematical object
We will derive and discuss some basic rules that are helpful for computing probabilities (using set operations)
Even though random outcomes cannot be predicted, in some cases we have an idea about the chance that an outcome occurs.
If you toss a fair coin, the chance of observing a head is the same as that of observing a tail.
If you buy a lottery ticket, the chance of winning is very small.
A probability function \(\mathbb{P}\) quantifies these chances.
Probability functions are computed on events \(A \in \mathcal{B}\). We calculate \(\mathbb{P}(A)\). Mathematically / formally, we have: \[ \mathbb{P}\, : \, \mathcal{B} \to [0, 1] \] where \(\mathcal{B}\) is a collection of possible events.
Probability functions need to do this “coherently”
Let \(\Omega\) be a sample space and \({\cal B}\) be a collection of events (i.e. subsets of \(\Omega\)).
A probability function is a function \(\mathbb{P}\) with domain \({\cal B}\) such that
Axiom 1: \(\mathbb{P}( \Omega ) = 1\);
Axiom 2: \(\mathbb{P}( A ) \geq 0\) for any \(A \in {\cal B}\);
Axiom 3: If \(\{ A_{n}\}_{n \ge 1}\) is a sequence of disjoint events, then \[\mathbb{P}\left( \bigcup_{n=1}^{\infty }A_{n}\right) \, = \sum_{n=1}^{\infty }\mathbb{P}( A_{n})\]
Note: \(\{ A_{n}\}_{n \ge 1}\) is a sequence of disjoint events when \(A_i \cap A_j = \varnothing\) if \(i \ne j\)
Kolmogorov showed how one can construct such functions, and that a probability function only needs to satisfy those three properties to be a “coherent” probability function
In other words: every desirable property of a probability \(\mathbb{P}\) can be shown to hold using only Axioms 1, 2, and 3 (and logic).
Alternatively: any function \(\mathbb{P}\) that satisfies Axioms 1, 2, and 3 is a “proper” probability function.
In general, \(A\), \(B\), \(C\), etc. denote arbitrary events. \(\Omega\) is the sample space.
Probability of the complement: \[\mathbb{P}( A^{c} ) =1-\mathbb{P}( A )\]
Monotonicity: \[A\subset B\Rightarrow \mathbb{P}( A ) \leq \mathbb{P}(B )\]
Probability of the union: \[\mathbb{P}( A\cup B ) =\mathbb{P}( A ) +\mathbb{P}( B ) - \mathbb{P}( A\cap B )\]
Boole’s inequality: \[\mathbb{P}( \bigcup _{i=1}^{m}A_{i} ) \leq \sum_{i=1}^{m}\mathbb{P}( A_{i} )\]
We need to show that if \(\mathbb{P}\) satisfies Axioms 1, 2, and 3, and \(A\) is an arbitrary event, then necessarily \(\mathbb{P}( A^{c} ) \ = \ 1-\mathbb{P}( A )\).
\(\Omega = A \cup A^{c}\) and \(A\) and \(A^c\) are disjoint.
Using Axiom 1 \[ 1 = \mathbb{P}(\Omega) = \mathbb{P}( A \cup A^{c} )\]
Axiom 3 says that \(\mathbb{P}( A \cup A^{c} ) = \mathbb{P}(A ) +\mathbb{P}( A^c)\)
Putting the last 2 statements together we get \(1 = \mathbb{P}( \Omega ) = \mathbb{P}(A ) +\mathbb{P}( A^c)\).
Since \(\mathbb{P}(A)\) and \(\mathbb{P}(A^c)\) are just numbers, simple arithmetic gives \[1-\mathbb{P}( A ) = \mathbb{P}( A^{c} )\]
Prove that \(B = ( B\cap A ) \cup ( B\cap A^{c})\);
Since we know that \(A \subset B\), then \(B \cap A = A\) (prove it!). Hence the equation above is \[B = ( B\cap A ) \cup ( B\cap A^{c}) = A \cup ( B \cap A^{c})\]
Since \(A\) and \(B \cap A^{c}\) are disjoint (why?), Axiom 3 says \(\mathbb{P}( B ) = \mathbb{P}( A ) + \mathbb{P}( B\cap A^{c})\)
Axiom 2 ensures that \(\mathbb{P}( B\cap A^{c}) \ge 0\). Thus, the last equation satisfies \[\mathbb{P}( B ) = \mathbb{P}( A ) + P ( B\cap A^{c}) \ge \mathbb{P}( A )\]
First prove that \(A\cup B = A\cup ( B\cap A^{c} )\). After that since \(A\) and \(B\cap A^{c}\) are disjoint events, Axiom 3 implies that
\[\mathbb{P}( A\cup B ) =\mathbb{P}( A ) +\mathbb{P}( B \cap A^{c} )\]
“Slice” \(B\) into the part it shares with \(A\) and the rest: prove that \(B = ( B\cap A ) \cup ( B\cap A^c )\), where the events on the right-hand side are disjoint. Then Axiom 3 gives \(\mathbb{P}( B) = \mathbb{P}( B\cap A ) +\mathbb{P}( B\cap A^c )\)
The last equation can be re-written as \(\mathbb{P}( B\cap A^c ) = \mathbb{P}( B) - \mathbb{P}( B\cap A )\)
Replace this expression for \(\mathbb{P}( B\cap A^c )\) in (1) above to get \[\mathbb{P}( A\cup B ) =\mathbb{P}( A ) +\mathbb{P}( B \cap A^{c} ) = \mathbb{P}( A ) + \mathbb{P}( B) - \mathbb{P}( B\cap A )\]
Ven diagrams are not proofs. They are sometimes useful, but a diagram is not a proof or a logical argument.
This is not a proof:
\[\mathbb{P}\left( \bigcup _{i=1}^{n}A_{i}\right) \leq \sum_{i=1}^{n}\mathbb{P}\left( A_{i}\right)\]
We will prove it by induction.
For \(n=2\): \[\begin{aligned} \mathbb{P}\left( A_{1}\cup A_{2}\right) & = \mathbb{P}\left( A_{1}\right) +\mathbb{P}\left( A_{2}\right) -\mathbb{P}\left( A_{1}\cap A_{2}\right) \\ \\ & \leq \mathbb{P}\left( A_{1}\right) +\mathbb{P}\left( A_{2}\right)\end{aligned}\] because \(\mathbb{P}\left( A_{1}\cap A_{2}\right) \geq 0\)
Marley borrows 2 books. Suppose that there is a 0.5 probability they like the first book, 0.4 that they like the second book, and 0.3 that they like both.
What is the probability that they will NOT like both books? (i.e. that they will not like either book?)
Jane must take two tests, call them \(T_1\) and \(T_2\). The probability that she passes test \(T_1\) is 0.8, that she passes test \(T_2\) is 0.7, and that of passing both tests is 0.6.
Calculate the probability that:
She passes at least one test.
She passes at most one test.
She fails both tests.
She passes only one test.
When there is a context (story) associated to the question, your answer MUST be a full sentence and present the result in the same context as the question.
You will loose points if your answer does not satisfy the above.
It is often useful to define relatively basic events for which you may have some information, and express other events of interest as combinations of the basic ones (unions, intersections, complements).
There is often more than one correct way to solve a problem.
There are many wrong answers, even if the final number happens to coincide with the correct one.
Suppose that \(\mathbb{P}\left( A\right) =0.85\) and \(\mathbb{P}\left( B\right) =0.75.\) Show that \[\mathbb{P}\left( A\cap B\right) \geq 0.60.\]
More generally, prove the Bonferroni inequality: \[\mathbb{P}\left( \bigcap _{i=1}^{n}A_{i}\right) \geq \sum_{i=1}^{n} \mathbb{P}\left( A_{i}\right) -\left( n-1\right) .\]
We will now construct probability functions in fairly simple cases – essentially where there are only finitely many possible outcomes, and they are all equally likely (think of possible outcomes when you roll a fair die, or pick a card from a shuffled deck, etc.)
We will discuss and use counting techniques, which are collectively known as combinatorics
Suppose the sample space \(\Omega\) is finite. \[\Omega \, = \, \Bigl\{ \omega _{1},\omega _{2}, \ldots,\omega _{n}\Bigr\}\]
In some cases these distinct outcomes are equally likely: \[\mathbb{P}( \{ \omega _{i} \} ) = a \, , \quad a \in (0, 1].\]
Since \(\Omega = \bigcup _{i=1}^{n} \left\{ \omega _{i}\right\}\), and \(\{\omega _{i}\}_{i=1}^n\) are disjoint events, by Axioms 1 and 3 we get
\[\begin{aligned} 1 &= \mathbb{P}\left( \Omega \right)= \mathbb{P}\Big ( \bigcup _{i=1}^{n}\left\{ \omega_{i}\right\} \Big ) = \sum_{i=1}^{n}\mathbb{P}\left( \left\{ \omega _{i}\right\} \right) =\sum_{i=1}^{n} a \, = \, n a. \end{aligned}\]
So \(a\) must be equal to \(1 / n\).
In other words: we have \(a = 1/n\).
The number of elements of a set \(A\) (aka its cardinal number) is denoted by \(\#A\) or \(|A|\).
If \(\# \Omega < \infty\), then for any event \(A \subseteq \Omega\) it holds that \[\begin{aligned} \mathbb{P}(A) &= \sum_{\omega _{i}\in A} \mathbb{P}( \{ \omega_{i} \}) = \sum_{\omega _{i}\in A}\frac{1}{n} = \frac{1}{n} \, \left( \sum_{\omega _{i}\in A} 1 \right) = \frac{\#A}{\#\Omega}\\ & \\ &= \frac{\#\{\mbox{outcomes in } A \}}{\#\{\mbox{outcomes in } \Omega \}} \\ & \\ &= \frac{\#\{\mbox{favourable outcomes} \}}{\#\{\mbox{possible outcomes}\}} \end{aligned}\]
In these cases (finite \(\Omega\), equally likely outcomes) to calculate probabilities it is sufficient to count the number of outcomes in the event of interest (\(\# A\)), and the number of total possible outcomes (\(\# \Omega\))
Counting the number of elements in a set can sometimes be surprisingly complicated.
Combinatorics helps!
We will learn some basic combinatorics techniques for this purpose.
In this example \(\#\Omega = \infty\), so we will not compute probabilities here, just count number of elements in sets / events.
A die is rolled repeatedly until we see a 6.
Specify/describe the sample space.
Let \(E_{n}\) denote the event that the number of rolls is exactly \(n\) (\(n=1,2, \ldots\)). Describe the event \(E_{n}\).
Describe the event \(E_{1}\cup E_{2}\) and \(( \bigcup _{n=1}^{\infty} E_{n}) ^{c}\).
Verbally interpreting \(E_{1}\cup E_{2}\), \(\bigcup_{n=1}^{\infty} E_{n}\), and \(( \bigcup_{n=1}^{\infty } E_n) ^{c}\):
A system has 5 components, which can either work or fail.
The experiment consists of observing the status (W/F) of the 5 components.
Describe the sample space for this experiment.
What is the value of \(\# \Omega\)?
Suppose that the system will work if either components (1 and 2), or (3 and 4), or (1, 3 and 5) work. List the outcomes in the event \(D =\left\{ \mbox{The system works}\right\}\)?
Let \(A= \{ \text{components 4 and 5 fail} \}\). What is \(\# A\)?
List the outcomes in \(A \cap D\).
(c) The system works if (1 and 2), or (3 and 4) or (1, 3 and 5) work.
Let \(D = \{ \mbox{The system works}\}\). Let us count it.
(d) A\(=\left\{ \text{\textbf{4 and 5 have failed}}\right\}\). How many outcomes are there in A?
Two dice have two sides painted red, two painted black, one painted yellow, and the other painted white.
When this pair of dice is rolled, what is the probability that both dice show the same color facing up?
Remark: If not explicitly declared, this type of problem assumes “equal likely outcomes”.
A small community consists of 20 families.
Four of them have 1 child, 8 have 2 children, 5 have 3 children, 2 have 4 children, and 1 has 5 children.
What is the probability that a randomly chosen family has \(i\) children, for each \(1 \le i \le 5\)?
What is the probability that a randomly chosen child comes from a family with \(i\) children, for each \(1 \le i \le 5\)?
NOTE: “randomly chosen” means that every possible choice is equally likely. In this case, we might be selecting families, or children.
We organize the information as follows:
i | Families with i children | Children from families w/ i children |
---|---|---|
1 | 4 | 4 |
2 | 8 | 16 |
3 | 5 | 15 |
4 | 2 | 8 |
5 | 1 | 5 |
Total | 20 | 48 |
Catch: There are 20 families, 48 children in this community.
Stat 302 - Winter 2025/26