11 Local methods

class: center, middle, inverse, title-slide

.title[
# 11 Local methods
]
.author[
### STAT 406
]
.author[
### Daniel J. McDonald
]
.date[
### Last modified - 2022-10-03
]

---

## Last time...

.pull-left[
We looked at __feature maps__ as a way to do nonlinear regression:

We used new "features" `$\Phi(x) = \bigg(\phi_1(x),\ \phi_2(x),\ldots,\phi_k(x)\bigg)$`

Now we examine an alternative

Suppose I just look at the "neighbors" of some point (based on the `$x$`-values)

I just average the `$y$`'s at those locations together

Let's use 3 neighbors
]

.pull-right[
![](rmd_gfx/11-kernel-smoothers/load-lidar-1.svg)

]

---

## KNN

.pull-left[
![](rmd_gfx/11-kernel-smoothers/small-lidar-again-1.svg)
]

.pull-right[

```r
data(lidar, package="SemiPar")
library(FNN)
lidar_unif <- lidar[sample.int(nrow(lidar), 40),] %>% 
  arrange(range)
new_range <- seq(min(lidar_unif$range), 
                 max(lidar_unif$range),
                 length.out = 101)
knn3 <- knn.reg(
  train = lidar_unif$range, 
  test = matrix(range, ncol = 1), 
  y = lidar_unif$logratio, 
  k = 3)
```

This method is `$K$`-nearest neighbors.

It's a __linear smoother__ just like in previous lectures: `$\widehat{\mathbf{y}} = S \mathbf{y}$` for some matrix `$S$`.

You should imagine what `$S$` looks like.

What is the effective degrees of freedom of KNN?

KNN averages the neighbors with equal weight.

But some neighbors are "closer" than other neighbors.
]

---

## Local averages

Instead of choosing the number of neighbors to average, we can average any observations within a certain distance.

The boxes have width 30.

---

## What is a "kernel" smoother?

* The mathematics:

> A kernel is any function `$K$` such that for any `$u$`, `$K(u) \geq 0$`, `$\int du K(u)=1$` and `$\int uK(u)du=0$`.

* The idea: a kernel is a nice way to take weighted averages. The kernel function gives the weights.

* The previous example is called the __boxcar__ kernel. It looks like this:

This one gives the same non-zero weight to all points within `$\pm 15$` range.

---

## Other kernels

Most of the time, we don't use the boxcar because the weights are weird. (constant)

A more common one is the Gaussian kernel:

For the plot, I made `$\sigma=7.5$`.

Now the weights "die away" for points farther from where we're predicting. (but all nonzero!!)

---

## Other kernels

What if I made `$\sigma=15$`?

Before, points far from `$x_{15}$` got very small weights, now they have more influence.

For the Gaussian kernel, `$\sigma$` determines something like the "range" of the smoother.

---

## Many Gaussians

The following code gives `$S$` for Gaussian kernel smoothers with different `$\sigma$`

```r
dmat = as.matrix(dist(x))
Sgauss <- function(sigma) {
  gg <-  dnorm(dmat, sd = sigma) # not an argument, uses the global dmat
  sweep(gg, 1, rowSums(gg), '/') # make the rows sum to 1.
}
```

---

## The bandwidth

* Choosing `$\sigma$` is __very__ important.

* This "range" parameter is called the __bandwidth__.

* It is way more important than which kernel you use.

* The default kernel in `ksmooth()` is something called 'Epanechnikov':

```r
epan <- function(x) 3/4 * (1 - x^2) * (abs(x) < 1)
```

---

## Choosing the bandwidth

As we have discussed, kernel smoothing (and KNN) are linear smoothers

`$$\widehat{\mathbf{y}} = S\mathbf{y}$$`

This has easy implications:

```r
loocv <- function(y, S){
  yhat = S %*% y
  mean( (y - yhat)^2 / (1 - diag(S))^2 )
}
```

The __effective degrees of freedom__ is `$\textrm{tr}(S)$`

Therefore we can use our model selection criteria from before

---

## Smoothing the full Lidar data

```r
dmat <- as.matrix(dist(lidar$range))
sigmas <- 10^(seq(log10(300), log10(.3), length=100))
loocvs <- map_dbl(sigmas, ~ loocv(lidar$logratio, Sgauss(.x)))
best_s <- sigmas[which.min(loocvs)]
lidar$smoothed <- Sgauss(best_s) %*% lidar$logratio
```

I considered `$\sigma \in [0.3,\ 300]$` and used `$14.93$`.

---

## Smoothing manually

I did Kernel Smoothing "manually"

1. For a fixed bandwidth

2. Compute the smoothing matrix

3. Make the predictions

4. Repeat and compute CV using our "nice" formula

The point is to "show how it works". It's also really easy.

There are a number of other ways to do this in R

```r
loess()
ksmooth()
KernSmooth::locpoly()
mgcv::gam()
np::npreg()
```

These have tricks and ways of doing CV and other things automatically.

Note: all I needed was the distance matrix `dist(x)`. Given a distance function (say, `$d(\mathbf{x}_i, \mathbf{x}_j) = ||\mathbf{x}_i - \mathbf{x}_j||_2 + I(x_{i,3} = x_{j,3})$`), I can use these methods.

---
class: middle, center, inverse

# Next time...

Why don't we just smooth everything all the time?