class: center, middle, inverse, title-slide .title[ # 11 Local methods ] .author[ ### STAT 406 ] .author[ ### Daniel J. McDonald ] .date[ ### Last modified - 2022-10-03 ] --- ## Last time... .pull-left[ We looked at __feature maps__ as a way to do nonlinear regression: We used new "features" `\(\Phi(x) = \bigg(\phi_1(x),\ \phi_2(x),\ldots,\phi_k(x)\bigg)\)` Now we examine an alternative Suppose I just look at the "neighbors" of some point (based on the `\(x\)`-values) I just average the `\(y\)`'s at those locations together Let's use 3 neighbors ] -- .pull-right[ <!-- --> ] --- ## KNN .pull-left[ <!-- --> ] .pull-right[ ```r data(lidar, package="SemiPar") library(FNN) lidar_unif <- lidar[sample.int(nrow(lidar), 40),] %>% arrange(range) new_range <- seq(min(lidar_unif$range), max(lidar_unif$range), length.out = 101) knn3 <- knn.reg( train = lidar_unif$range, test = matrix(range, ncol = 1), y = lidar_unif$logratio, k = 3) ``` This method is `\(K\)`-nearest neighbors. It's a __linear smoother__ just like in previous lectures: `\(\widehat{\mathbf{y}} = S \mathbf{y}\)` for some matrix `\(S\)`. You should imagine what `\(S\)` looks like. What is the effective degrees of freedom of KNN? KNN averages the neighbors with equal weight. But some neighbors are "closer" than other neighbors. ] --- ## Local averages Instead of choosing the number of neighbors to average, we can average any observations within a certain distance. <img src="rmd_gfx/11-kernel-smoothers/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> -- The boxes have width 30. --- ## What is a "kernel" smoother? * The mathematics: > A kernel is any function `\(K\)` such that for any `\(u\)`, `\(K(u) \geq 0\)`, `\(\int du K(u)=1\)` and `\(\int uK(u)du=0\)`. * The idea: a kernel is a nice way to take weighted averages. The kernel function gives the weights. * The previous example is called the __boxcar__ kernel. It looks like this: <img src="rmd_gfx/11-kernel-smoothers/boxcar-1.svg" style="display: block; margin: auto;" /> This one gives the same non-zero weight to all points within `\(\pm 15\)` range. --- ## Other kernels Most of the time, we don't use the boxcar because the weights are weird. (constant) A more common one is the Gaussian kernel: <img src="rmd_gfx/11-kernel-smoothers/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> For the plot, I made `\(\sigma=7.5\)`. Now the weights "die away" for points farther from where we're predicting. (but all nonzero!!) --- ## Other kernels What if I made `\(\sigma=15\)`? <img src="rmd_gfx/11-kernel-smoothers/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> Before, points far from `\(x_{15}\)` got very small weights, now they have more influence. For the Gaussian kernel, `\(\sigma\)` determines something like the "range" of the smoother. --- ## Many Gaussians The following code gives `\(S\)` for Gaussian kernel smoothers with different `\(\sigma\)` ```r dmat = as.matrix(dist(x)) Sgauss <- function(sigma) { gg <- dnorm(dmat, sd = sigma) # not an argument, uses the global dmat sweep(gg, 1, rowSums(gg), '/') # make the rows sum to 1. } ``` <img src="rmd_gfx/11-kernel-smoothers/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" /> --- ## The bandwidth * Choosing `\(\sigma\)` is __very__ important. * This "range" parameter is called the __bandwidth__. * It is way more important than which kernel you use. * The default kernel in `ksmooth()` is something called 'Epanechnikov': ```r epan <- function(x) 3/4 * (1 - x^2) * (abs(x) < 1) ``` <img src="rmd_gfx/11-kernel-smoothers/unnamed-chunk-8-1.svg" style="display: block; margin: auto;" /> --- ## Choosing the bandwidth As we have discussed, kernel smoothing (and KNN) are linear smoothers `$$\widehat{\mathbf{y}} = S\mathbf{y}$$` This has easy implications: -- ```r loocv <- function(y, S){ yhat = S %*% y mean( (y - yhat)^2 / (1 - diag(S))^2 ) } ``` -- The __effective degrees of freedom__ is `\(\textrm{tr}(S)\)` Therefore we can use our model selection criteria from before --- ## Smoothing the full Lidar data ```r dmat <- as.matrix(dist(lidar$range)) sigmas <- 10^(seq(log10(300), log10(.3), length=100)) loocvs <- map_dbl(sigmas, ~ loocv(lidar$logratio, Sgauss(.x))) best_s <- sigmas[which.min(loocvs)] lidar$smoothed <- Sgauss(best_s) %*% lidar$logratio ``` <img src="rmd_gfx/11-kernel-smoothers/smoothed-lidar-1.svg" style="display: block; margin: auto;" /> I considered `\(\sigma \in [0.3,\ 300]\)` and used `\(14.93\)`. --- ## Smoothing manually I did Kernel Smoothing "manually" 1. For a fixed bandwidth 2. Compute the smoothing matrix 3. Make the predictions 4. Repeat and compute CV using our "nice" formula The point is to "show how it works". It's also really easy. There are a number of other ways to do this in R ```r loess() ksmooth() KernSmooth::locpoly() mgcv::gam() np::npreg() ``` These have tricks and ways of doing CV and other things automatically. Note: all I needed was the distance matrix `dist(x)`. Given a distance function (say, `\(d(\mathbf{x}_i, \mathbf{x}_j) = ||\mathbf{x}_i - \mathbf{x}_j||_2 + I(x_{i,3} = x_{j,3})\)`), I can use these methods. --- class: middle, center, inverse # Next time... Why don't we just smooth everything all the time?