Course introduction and set theory and probability axioms
Last modified — 09 Jan 2026
Total: min(20, Pre-Class + In-Class)%
Midterm 1: 20%
Midterm 2: 20%
Final: 40%
All closed-book, (probably) no notes.
The pre-class and in-class exercises will be designed to prepare you for the kinds of questions we ask on exams.
So: actually do them! They’re there to help you practice. Don’t rely on solutions you find online or AI assistance.
An action undertaken to make a discovery, test a hypothesis, or confirm a known fact.
Release your pen from 4.9 meters above the ground.
The pen will fall to the ground.
It will take about 1 sec to reach the ground.
The pen did touch the ground
Less sure if it took exactly 1 second to do so
The outcome of some experiments cannot be determined beforehand.
Roll a die: which side will show?
Draw a card from a well-shuffled deck: which one you will get?
How many students will be in the classroom today?
Even though die rolls are random, patterns emerge when we repeat the experiment many times.
Probability Theory describes such patterns via mathematical models.
It is a branch of Mathematics, and is based on a set of Axioms.
Axioms: Statements or propositions accepted to hold true
Theorems: Propositions which are established to hold true using sound logical reasoning.
We denote it by \(\Omega\), and a generic outcome, also called sample point, by \(\omega\) (i.e. \(\omega \in \Omega\)).
Note
The text uses \(S\) and \(s \in S\).
Roll a die: \(\Omega = \{ 1,2,3,4,5, 6 \} \subset \mathbb{N}\).
Draw a card from a poker deck: \(\Omega = \{ 2\spadesuit, 2\diamondsuit, \ldots, A\clubsuit ,A \heartsuit \}\)
Wind speed at YVR (km/h): \(\Omega =[0, \infty) \subset \mathbb{R}\).
Wait time for R4 at UBC (min): \(\Omega =[0, 720) \subset \mathbb{R}\)
Notation: We commonly use upper case letters (\(A\), \(B\), \(C\), …) for events.
Events are sets:
\(\omega \in A\) means “\(\omega\) is an element of \(A\)”.
\(C \subset D\) means “\(C\) is a subset of \(D\)”.
Events are often formed by outcomes sharing some property. It’s a good idea to practice listing explicitly the sample points of events described with words.
Roll a dice:
Bus wait time: \(H =\) “wait is less than half an hour” \(= [15, 30]\)
Max-wind-speed: \(G =\) “wind is over 80 km/hour” \(= (80, \infty )\)
A die is rolled repeatedly until we see a 6.
Specify/describe the sample space.
Let \(E_{n}\) denote the event that the number of rolls is exactly \(n\) (\(n=1,2, \ldots\)). Describe the event \(E_{n}\).
A system has 5 components, which can either work or fail.
The experiment consists of observing the status (W/F) of the 5 components.
Describe the sample space for this experiment.
What is the value of \(\# \Omega\)?
Let \(A= \{ \text{components 4 and 5 fail} \}\). What is \(\# A\)?
Suppose \(A\), \(B\) are events (subsets of \(\Omega\)).
Union: \(A \mathop{\mathrm{\mathchoice{\bigcup}{\cup}{\cup}{\cup}}}B\) \[\omega \in A \cup B \Leftrightarrow \omega \in A \mbox{ or } \omega \in B\]
Intersection: \(A \cap B\) \[\omega \in A \cap B \Leftrightarrow \omega \in A \mbox{ and } \omega \in B\]
Complement: \(A^c\) \[\omega \in A^c\Leftrightarrow \omega \notin A\]
Symmetric difference: \(A \, \triangle \, B\) \[A \, \triangle \, B \, = \, \left( A \cap B^c \right) \, \cup \, \left( A^c \cap B \right)\]
Equality
Commutative:
\(A \cup B \ = \ B \cup A\)
\(A \cap B \ = \ B \cap A\)
Associative:
\(A\cup B\cup C \, = \, \left( A\cup B\right) \cup C=A\cup \left( B\cup C\right)\)
\(A\cap B\cap C \, = \, \left( A\cap B\right) \cap C=A\cap \left( B\cap C\right)\)
Distributive:
\(\left( A\cup B\right) \cap C \, = \, \left( A\cap C\right) \cup \left( B\cap C\right)\)
\(\left( A\cap B\right) \cup C \, = \, \left( A\cup C\right) \cap \left( B\cup C\right)\)
Hint: use the fact that \(B \cup B^c = \Omega\)
Hint: use the first rule above above to express \(B\) in terms of \(B\cap A\) and \(B\cap A^c\)
To prove the theorem it is sufficient to show that \[ ( A\cup B )^{c} \subseteq A^{c}\cap B^{c} \] and that \[ A^{c}\cap B^{c} \subseteq ( A\cup B ) ^{c} \]
The power set of \(\Omega\) (denoted \(2^\Omega\)) is the set of all possible subsets of \(\Omega\).
For example, if \(\Omega \ = \ \{ 1,2,3 \}\) then: \[2^{\Omega } \, = \Bigl\{ \varnothing , \{ 1\} , \{ 2 \} , \{ 3 \} , \{ 1,2 \} , \{ 1,3 \} , \{ 2,3 \}, \{ 1, 2, 3 \} \Bigr\}\]
The symbol \(\varnothing\) denotes the empty set: \(\varnothing = \{ \}\).
The symbol \(\#\) or \(|\cdot|\) denote the size of a set (number of elements): \(\#\Omega\) or \(|\Omega|\) both mean the number of elements in \(\Omega\).
We will define what a probability is, as a mathematical object
We will derive and discuss some basic rules that are helpful for computing probabilities (using set operations)
Even though random outcomes cannot be predicted, in some cases we have an idea about the chance that an outcome occurs.
If you toss a fair coin, the chance of observing a head is the same as that of observing a tail.
If you buy a lottery ticket, the chance of winning is very small.
A probability function \(\mathbb{P}\) quantifies these chances.
Probability functions are computed on events \(A \in \mathcal{B}\). We calculate \(\mathbb{P}(A)\). Mathematically / formally, we have: \[ \mathbb{P}\, : \, \mathcal{B} \to [0, 1] \] where \(\mathcal{B}\) is a collection of possible events.
Probability functions need to do this “coherently”
Let \(\Omega\) be a sample space and \({\cal B}\) be a collection of events (i.e. subsets of \(\Omega\)).
A probability function is a function \(\mathbb{P}\) with domain \({\cal B}\) such that
Axiom 1: \(\mathbb{P}( \Omega ) = 1\);
Axiom 2: \(\mathbb{P}( A ) \geq 0\) for any \(A \in {\cal B}\);
Axiom 3: If \(\{ A_{n}\}_{n \ge 1}\) is a sequence of disjoint events, then \[\mathbb{P}\left( \bigcup_{n=1}^{\infty }A_{n}\right) \, = \sum_{n=1}^{\infty }\mathbb{P}( A_{n})\]
Note: \(\{ A_{n}\}_{n \ge 1}\) is a sequence of disjoint events when \(A_i \cap A_j = \varnothing\) if \(i \ne j\)
Kolmogorov showed how one can construct such functions, and that a probability function only needs to satisfy those three properties to be a “coherent” probability function
In other words: every desirable property of a probability \(\mathbb{P}\) can be shown to hold using only Axioms 1, 2, and 3 (and logic).
Alternatively: any function \(\mathbb{P}\) that satisfies Axioms 1, 2, and 3 is a “proper” probability function.
In general, \(A\), \(B\), \(C\), etc. denote arbitrary events. \(\Omega\) is the sample space.
Probability of the complement: \(\mathbb{P}( A^{c} ) =1-\mathbb{P}( A )\)
Monotonicity: \(A\subset B\Rightarrow \mathbb{P}( A ) \leq \mathbb{P}(B )\)
Probability of the union: \(\mathbb{P}( A\cup B ) =\mathbb{P}( A ) +\mathbb{P}( B ) - \mathbb{P}( A\cap B )\)
Boole’s inequality: \(\mathbb{P}( \bigcup _{i=1}^{m}A_{i} ) \leq \sum_{i=1}^{m}\mathbb{P}( A_{i} )\)
Marley borrows 2 books. Suppose that there is a 0.5 probability they like the first book, 0.4 that they like the second book, and 0.3 that they like both.
What is the probability that they will NOT like both books? (i.e. that they will not like either book?)
Jane must take two tests, call them \(T_1\) and \(T_2\). The probability that she passes test \(T_1\) is 0.8, that she passes test \(T_2\) is 0.7, and that of passing both tests is 0.6.
Calculate the probability that:
She passes at least one test.
She passes at most one test.
She fails both tests.
She passes only one test.
Suppose that \(\mathbb{P}\left( A\right) =0.85\) and \(\mathbb{P}\left( B\right) =0.75.\) Show that \[\mathbb{P}\left( A\cap B\right) \geq 0.60.\]
More generally, prove the Bonferroni inequality: \[\mathbb{P}\left( \bigcap _{i=1}^{n}A_{i}\right) \geq \sum_{i=1}^{n} \mathbb{P}\left( A_{i}\right) -\left( n-1\right) .\]
Stat 302 - Winter 2025/26