00 Intro to class

Stat 406

Daniel J. McDonald

Last modified – 10 December 2023

\[ \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\minimize}{minimize} \DeclareMathOperator*{\maximize}{maximize} \DeclareMathOperator*{\find}{find} \DeclareMathOperator{\st}{subject\,\,to} \newcommand{\E}{E} \newcommand{\Expect}[1]{\E\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[2]{\mathrm{Cov}\left[#1,\ #2\right]} \newcommand{\given}{\ \vert\ } \newcommand{\X}{\mathbf{X}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\P}{\mathcal{P}} \newcommand{\R}{\mathbb{R}} \newcommand{\norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\snorm}[1]{\lVert #1 \rVert} \newcommand{\tr}[1]{\mbox{tr}(#1)} \newcommand{\brt}{\widehat{\beta}^R_{s}} \newcommand{\brl}{\widehat{\beta}^R_{\lambda}} \newcommand{\bls}{\widehat{\beta}_{ols}} \newcommand{\blt}{\widehat{\beta}^L_{s}} \newcommand{\bll}{\widehat{\beta}^L_{\lambda}} \newcommand{\U}{\mathbf{U}} \newcommand{\D}{\mathbf{D}} \newcommand{\V}{\mathbf{V}} \]

About me

Philosophy of the class

I and the TAs are here to help you learn. Ask questions.

We encourage engagement, curiosity and generosity

We favour steady work through the Term (vs. sleeping until finals)

The assessments attempt to reflect this ethos.

More philosophy

When the term ends, I want

  • You to be better at coding.
  • You to have an understanding of the variety of methods available to do prediction and data analysis.
  • You to articulate their strengths and weaknesses.
  • You to be able to choose between different methods using your intuition and the data.

I do not want

  • You to be under undo stress
  • You to feel the need to cheat, plagiarize, or drop the course
  • You to feel treated unfairly.

I promise

  • To grade/mark fairly. Good faith effort will be rewarded
  • To be flexible. This semester (like the last 4) is different for everyone.
  • To understand and adapt to issues.

I do not promise that you will all get the grade you want.

COVID considerations

On COVID

  • I work on COVID a lot.

  • Statistics is hugely important.

Policies (TL; DR)

  • I encourage you to wear a mask

  • Do NOT come to class if you are possibly sick

  • Be kind and considerate to others

  • The Marking scheme is flexible enough to allow some missed classes

Course map

Predictive models



1. Preprocessing

centering / scaling / factors-to-dummies / basis expansion / missing values / dimension reduction / discretization / transformations

2. Model fitting

Which box do you use?

3. Prediction

Repeat all the preprocessing on new data. But be careful.

4. Postprocessing, interpretation, and evaluation

Source: https://vas3k.com/blog/machine_learning/

6 modules

  1. Review (today and next week)
  2. Model accuracy and selection
  3. Regularization, smoothing, trees
  4. Classifiers
  5. Modern techniques (classification and regression)
  6. Unsupervised learning
  • Each module is approximately 2 weeks long

  • Each module is based on a collection of readings and lectures

  • Each module (except the review) has a homework assignment

Assessments

Effort-based

Total across three components: 65 points, any way you want

  1. Labs, up to 20 points (2 each)
  2. Assignments, up to 50 points (10 each)
  3. Clickers, up to 10 points

Knowledge-based

Final Exam, 35 points

Why this scheme?

  • You stay on top of the material

  • You come to class and participate

  • You gain coding practice in the labs

  • You work hard on the assignments

Most of this is Effort Based

work hard, guarantee yourself 65%

Time expectations per week:

  • Coming to class – 3 hours

  • Reading the book – 1 hour

  • Labs – 1 hour

  • Homework – 4 hours

  • Study / thinking / playing – 1 hour

Labs / Assignments

The goal is to “Do the work”

Assignments

  • Not easy, especially the first 2, especially if you are unfamiliar with R / Rmarkdown / ggplot

  • You may revise to raise your score to 7/10, see Syllabus. Only if you lose 3+ for content (penalties can’t be redeemed).

  • Don’t leave these for the last minute

Labs

  • Labs should give you practice, allow for questions with the TAs.

  • They are due at 2300 on the day of your lab, lightly graded.

  • You may do them at home, but you must submit individually (in lab, you may share submission)

  • Labs are lightly graded

Clickers

  • Questions are similar to the Final

  • 0 points for skipping, 2 points for trying, 4 points for correct

    • Average of 3 = 10 points (the max)
    • Average of 2 = 5 points
    • Average of 1 = 0 points
    • total = max(0, min(5 * points / N - 5, 10))
  • Be sure to sync your device in Canvas.

Don’t do this!

Average < 1 drops your Final Mark 1 letter grade.

A- becomes B-, C+ becomes D.

Final Exam

  • Scheduled by the university.

  • It is hard

  • The median last year was 50% \(\Rightarrow\) A-

Philosophy:

If you put in the effort, you’re guaranteed a C+.
But to get an A+, you should really deeply understand the material.

No penalty for skipping the final.

If you’re cool with C+ and hate tests, then that’s fine.

Advice

  • Skipping HW makes it difficult to get to 65

  • Come to class!

  • Yes it’s at 8am. I hate it too.

  • To compensate, I will record the class and post to Canvas.

  • In terms of last year’s class, attendance in lecture and active engagement (asking questions, coming to office hours, etc.) is the best predictor of success.

Questions?

Textbooks

An Introduction to Statistical Learning

James, Witten, Hastie, Tibshirani, 2013, Springer, New York. (denoted [ISLR])

Available free online: http://statlearning.com/

The Elements of Statistical Learning

Hastie, Tibshirani, Friedman, 2009, Second Edition, Springer, New York. (denoted [ESL])

Also available free online: https://web.stanford.edu/~hastie/ElemStatLearn/

It’s worth your time to read.

If you need more practice, read the Worksheets.

Computer

  • All coding in R

  • Suggest you use RStudio IDE

  • See https://ubc-stat.github.io/stat-406/ for instructions

  • It tells you how to install what you will need, hopefully all at once, for the whole Term.

  • We will use R and we assume some background knowledge.

  • Links to useful supplementary resources are available on the website.

This course is not an intro to R / python / MongoDB / SQL.

Other resources

Canvas
Grades, links to videos from class
Course website
All the material (slides, extra worksheets) https://ubc-stat.github.io/stat-406
Slack
Discussion board, questions.
Github
Homework / Lab submission

  • All lectures will be recorded and posted

  • I cannot guarantee that they will all work properly (sometimes I mess it up)

Some more words

  • Lectures are hard. It’s 8am, everyone’s tired.

  • Coding is hard. I hope you’ll get better at it.

  • I strongly urge you to get up at the same time everyday. My plan is to go to the gym on MWF. It’s really hard to sleep in until 10 on MWF and make class at 8 on T/Th.


Let’s be kind and understanding to each other.

I have to give you a grade, but I want that grade to reflect your learning and effort, not other junk.


If you need help, please ask.

Questions?

https://ubc-stat.github.io/stat-406/

https://github.com/stat-406-2023/

Some things to see on the website

  • Read the syllabus (See also Quiz 0)
  • Links to slides, how to download / print, browse source code
  • Install the R package, read documentation, check your LaTeX installation
  • BE SURE to follow the Computer Setup instructions!
  • Worksheets for extra help.
  • Read the FAQ!
  • View the Course GitHub (once you have access)