00 Review and bonus clickers

Stat 406

Geoff Pleiss, Trevor Campbell

Last modified – 30 July 2024

\[ \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\minimize}{minimize} \DeclareMathOperator*{\maximize}{maximize} \DeclareMathOperator*{\find}{find} \DeclareMathOperator{\st}{subject\,\,to} \newcommand{\E}{E} \newcommand{\Expect}[1]{\E\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[2]{\mathrm{Cov}\left[#1,\ #2\right]} \newcommand{\given}{\ \vert\ } \newcommand{\X}{\mathbf{X}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\P}{\mathcal{P}} \newcommand{\R}{\mathbb{R}} \newcommand{\norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\snorm}[1]{\lVert #1 \rVert} \newcommand{\tr}[1]{\mbox{tr}(#1)} \newcommand{\brt}{\widehat{\beta}^R_{s}} \newcommand{\brl}{\widehat{\beta}^R_{\lambda}} \newcommand{\bls}{\widehat{\beta}_{ols}} \newcommand{\blt}{\widehat{\beta}^L_{s}} \newcommand{\bll}{\widehat{\beta}^L_{\lambda}} \newcommand{\U}{\mathbf{U}} \newcommand{\D}{\mathbf{D}} \newcommand{\V}{\mathbf{V}} \]

Big picture

  • What is a model?
  • How do we evaluate models?
  • How do we decide which models to use?
  • How do we improve models?

General stuff

  • Linear algebra (SVD, matrix multiplication, matrix properties, etc.)
  • Optimization (derivitive + set to 0, gradient descent, Newton’s method, etc.)
  • Probability (conditional probability, Bayes rule, etc.)
  • Statistics (likelihood, MLE, confidence intervals, etc.)

1. Model selection

  • What is a statistical model?
  • What is the goal of model selection?
  • What is the difference between training and test error?
  • What is overfitting?
  • What is the bias-variance tradeoff?
  • What is the difference between AIC / BIC / CV / Held-out validation?

2. Regression

  • What do we mean by regression?
  • What is the difference between linear and non-linear regression?
  • What are linear smoothers and why do we care?
  • What is feature creation?
  • What is regularization?
  • What is the difference between L1 and L2 regularization?

3. Classification

  • What is classification? Bayes Rule?
  • What are linear decision boundaries?
  • Compare logistic regression to discriminant analysis.
  • What are the positives and negatives of trees?
  • What about loss functions? How do we measure performance?

4. Modern methods

  • What is the difference between bagging and boosting?
  • What is the point of the bootstrap?
  • What is the difference between random forests and bagging?
  • How do we understand Neural Networks?

5. Unsupervised learning

  • What is unsupervised learning?
  • Can be used for feature creation / EDA.
  • Understanding linear vs. non-linear methods.
  • What does PCA / KPCA estimate?
  • Positives and negatives of clustering procedures.

Pause for course evals

Currently at 18/139.

A few clicker questions

The singular value decomposition applies to any matrix.


  1. True
  2. False

Which of the following produces the ridge regression estimate of \(\beta\) with \(\lambda = 1\)?


  1. lm(y ~ x, lambda = 1)
  2. (crossprod(x)) + diag(ncol(x))) %*% crossprod(x, y)
  3. solve(crossprod(x) + diag(ncol(x))) %*% crossprod(x, y)
  4. glmnet(x, y, lambda = 1, alpha = 0)

If Classifier A has higher AUC than Classifier B, then Classifier A is preferred.


  1. True
  2. False

Which of the following is true about the bootstrap?


  1. It is a method for estimating the sampling distribution of a statistic.
  2. It is a method for estimating expected prediction error.
  3. It is a method for improving the performance of a classifier.
  4. It is a method for estimating the variance of a statistic.

Which campus eatery is the best place to celebrate the end of the Term?


  1. Koerner’s
  2. Sports Illustrated Clubhouse (formerly Biercraft)
  3. Brown’s Crafthouse
  4. Rain or Shine