Stat 406
Geoff Pleiss, Trevor Campbell
Last modified – 24 November 2023
\[ \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\minimize}{minimize} \DeclareMathOperator*{\maximize}{maximize} \DeclareMathOperator*{\find}{find} \DeclareMathOperator{\st}{subject\,\,to} \newcommand{\E}{E} \newcommand{\Expect}[1]{\E\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[2]{\mathrm{Cov}\left[#1,\ #2\right]} \newcommand{\given}{\ \vert\ } \newcommand{\X}{\mathbf{X}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\P}{\mathcal{P}} \newcommand{\R}{\mathbb{R}} \newcommand{\norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\snorm}[1]{\lVert #1 \rVert} \newcommand{\tr}[1]{\mbox{tr}(#1)} \newcommand{\brt}{\widehat{\beta}^R_{s}} \newcommand{\brl}{\widehat{\beta}^R_{\lambda}} \newcommand{\bls}{\widehat{\beta}_{ols}} \newcommand{\blt}{\widehat{\beta}^L_{s}} \newcommand{\bll}{\widehat{\beta}^L_{\lambda}} \newcommand{\U}{\mathbf{U}} \newcommand{\D}{\mathbf{D}} \newcommand{\V}{\mathbf{V}} \]
(We assume \(\X\) is already centered/scaled, \(n\) rows, \(p\) columns)
PCA:
The “embedding” is \(\U_M \D_M\).
(called the “Principal Components” or the “scores” or occasionally the “factors”)
The “loadings” or “weights” are \(\V_M\)
KPCA:
The “embedding” is \(\U_M \D_M\).
There are no “loadings”
(\(\not\exists\ \mathbf{B}\) such that \(\X\mathbf{B} = \U_M \D_M\))
The “maximize variance” version of PCA:
\[\max_\alpha \Var{\X\alpha} \quad \textrm{ subject to } \quad \left|\left| \alpha \right|\right|_2^2 = 1\]
( \(\Var{\X\alpha} = \alpha^\top\X^\top\X\alpha\) )
This is equivalent to solving (Lagrangian):
\[\max_\alpha \alpha^\top\X^\top\X\alpha - \lambda\left|\left| \alpha \right|\right|_2^2\]
Take derivative wrt \(\alpha\) and set to 0:
\[0 = 2\X^\top\X\alpha - 2\lambda\alpha\]
This is the equation for an eigenproblem. The solution is \(\alpha=\V_1\) and the maximum is \(\D_1^2\).
PCA: (all 3 are equivalent)
KPCA:
Showing the first 10 PCA loadings:
To name a few
elephant <- function(eye = TRUE) {
tib <- tibble(
tt = -100:500 / 100,
y = -(12 * cos(3 * tt) - 14 * cos(5 * tt) + 50 * sin(tt) + 18 * sin(2 * tt)),
x = -30 * sin(tt) + 8 * sin(2 * tt) - 10 * sin(3 * tt) - 60 * cos(tt)
)
if (eye) tib <- add_row(tib, y = 20, x = 20)
tib
}
ele <- elephant(FALSE)
noisy_ele <- ele |>
mutate(y = y + rnorm(n(), 0, 5), x = x + rnorm(n(), 0, 5))
ggplot(noisy_ele, aes(x, y, colour = tt)) +
geom_point() +
scale_color_viridis_c() +
theme(legend.position = "none") +
geom_path(data = ele, colour = "black", linewidth = 2)
Clustering
UBC Stat 406 - 2024