26 PCA v KPCA

class: center, middle, inverse, title-slide

.title[
# 26 PCA v KPCA
]
.author[
### STAT 406
]
.author[
### Daniel J. McDonald
]
.date[
### Last modified - 2022-11-23
]

---

## PCA v KPCA

`$$\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]}
\newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]}
\newcommand{\Cov}[2]{\mathrm{Cov}\left[#1,\ #2\right]}
\newcommand{\given}{\ \vert\ }
\newcommand{\argmin}{\arg\min}
\newcommand{\argmax}{\arg\max}
\newcommand{\R}{\mathbb{R}}
\newcommand{\P}{P}
\newcommand{\norm}[1]{\left\lVert #1 \right\rVert}
\renewcommand{\hat}{\widehat}
\newcommand{\tr}[1]{\mbox{tr}(#1)}
\newcommand{\X}{\mathbf{X}}
\newcommand{\y}{\mathbf{y}}
\newcommand{\x}{\mathbf{x}}
\newcommand{\U}{\mathbf{U}}
\newcommand{\D}{\mathbf{D}}
\newcommand{\V}{\mathbf{V}}$$`

(We assume `$\X$` is already centered/scaled, `$n$` rows, `$p$` columns)

.pull-left[
__PCA:__

0. Start with data.
1. Decompose `$\X=\U\D\V^\top$` (SVD). 
2. Embed into `$M\leq p$` dimensions: 
`$$\U_M \D_M = \X\V_M$$`

The "embedding" is `$\U_M \D_M$`.

(called the "Principal Components" or the "scores" or occasionally the "factors")

The "loadings" are `$\V_M$`
]

.pull-right[
__KPCA:__

1. Choose `$k(x_i, x_{i'})$`. Create `$\mathbf{K}$`.
1. Double center `$\mathbf{K} = \mathbf{PKP}$`.
1. Decompose `$\mathbf{K} = \U \D^2 \U^\top$` (eigendecomposition).
1. Embed into `$M\leq p$` dimensions:
`$$\U_M \D_M$$`

The "embedding" is `$\U_M \D_M$`.

There are no "loadings"  
(there exists no matrix `$\mathbf{B}$` such that `$\X\mathbf{B} = \U_M \D_M$`)
]

---

## Why is this the solution?

The "maximize variance" version of PCA:

`$$\max_\alpha \Var{\X\alpha} \quad \textrm{ subject to } \quad \left|\left| \alpha \right|\right|_2^2 = 1$$`

( `$\Var{\X\alpha} = \alpha^\top\X^\top\X\alpha$` )

This is equivalent to solving (Lagrangian):

`$$\max_\alpha \alpha^\top\X^\top\X\alpha - \lambda\left|\left| \alpha \right|\right|_2^2$$`

Take derivative wrt `$\alpha$` and set to 0:

`$$0 = 2\X^\top\X\alpha - 2\lambda\alpha$$`

This is the equation for an eigenproblem. The solution is `$\alpha=\V_1$` and the maximum is `$\D_1^2$`.

---

## Example (not real unless there's code)

```r
data("mobility", package = "Stat406")
X <- mobility %>% select(Black:Married) %>% as.matrix()
not_missing <- X %>% complete.cases()
X <- scale(X[not_missing, ], center = TRUE, scale = TRUE)
colors <- mob$Mobility[not_missing] 
M <- 2 # embedding dimension
P <- diag(nrow(X)) - 1 / nrow(X)
```

.pull-left[
__PCA:__ (all 3 are equivalent)

```r
s <- svd(X) # use svd 
pca_loadings <- s$v[,1:M]
pca_scores <- X %*% pca_loadings
```

```r
s <- eigen(t(X) %*% X) # V D^2 V'
pca_loadings <- s$vectors[,1:M]
pca_scores <- X %*% pca_loadings
```

```r
s <- eigen(X %*% t(X)) # U D^2 U'
D <- sqrt(diag(s$values[1:M]))
U <- s$vectors[,1:M]
pca_scores <- U %*% D
pca_loadings = (1/D) %*% t(U) %*% X
```

]

.pull-right[
__KPCA:__

```r
d <- 2
K <- P %*% (1 + X %*% t(X))^d %*% P # polynomial
e <- eigen(K) # U D^2 U' 
# (different from the PCA one, K /= XX')
U <- e$vectors[,1:M]
D <- diag(sqrt(e$values[1:M]))
kpca_poly <- U %*% D
```

```r
K <- P %*% tanh(1 + X %*% t(X)) %*% P # sigmoid kernel
e <- eigen(K) # U D^2 U' 
# (different from the PCA one, K /= XX')
U <- e$vectors[,1:M]
D <- diag(sqrt(e$values[1:M]))
kpca_sigmoid <- U %*% D
```

]

---

## Plotting

---

## PCA loadings

Showing the first 10 PCA loadings:

* First column are the weights on the first score
* each number corresponds to a variable in the original data
* How much does that variable contribute to that score?

```r
head(round(pca_loadings,2), 10)
```

```
##        [,1]  [,2]
##  [1,]  0.25  0.07
##  [2,]  0.13 -0.14
##  [3,]  0.17 -0.34
##  [4,]  0.18 -0.33
##  [5,]  0.16 -0.34
##  [6,] -0.24  0.11
##  [7,] -0.04 -0.35
##  [8,]  0.28  0.00
##  [9,]  0.13 -0.14
## [10,]  0.29  0.10
```

---

## KPCA, feature map version

```r
p <- ncol(X)
scX <- scale(X)
width <- p*(p-1)/2 + p # ~630
Z <- matrix(NA, nrow(X), width)
k <-  0
for(i in 1:p){
  for(j in i:p){
    k <- k + 1
    Z[,k] <- X[,i] * X[,j]
  }
}
wideX <- scale(cbind(X, Z))
s <- RSpectra::svds(wideX, 2) # the whole svd would be super slow
fkpca_scores <- s$u %*% diag(s$d)
```

* Unfortunately, can't easily compare to check whether the result is the same
* Also can cause numerical issues
* But should be the "same" (assuming I didn't screw up...)
* Would also allow me to get the loadings, though they'd depend on polynomials