13 GAMs and Trees

class: center, middle, inverse, title-slide

.title[
# 13 GAMs and Trees
]
.author[
### STAT 406
]
.author[
### Daniel J. McDonald
]
.date[
### Last modified - 2022-10-03
]

---

## GAMs

Last time we discussed smoothing in multiple dimensions.

`$$\newcommand{\Expect}[1]{E\left[ #1 \right]}
\newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]}
\newcommand{\Cov}[2]{\mathrm{Cov}\left[#1,\ #2\right]}
\newcommand{\given}{\ \vert\ }
\newcommand{\R}{\mathbb{R}}
\newcommand{\tr}[1]{\mbox{tr}(#1)}
\newcommand{\X}{\mathbf{X}}
\newcommand{\y}{\mathbf{y}}$$`

Here we introduce the concept of GAMs (__G__eneralized __A__dditive __M__odels)

The basic idea is to imagine that the response is the sum of some functions of the predictors:

`$$\Expect{Y \given X=x} = \beta_0 + f_1(x_{1})+\cdots+f_p(x_{p}).$$`

Note that OLS __is__ a GAM (take `$f_j(x_{j})=\beta_j x_{j}$`):

`$$\Expect{Y \given X=x} = \beta_0 + \beta_1 x_{1}+\cdots+\beta_p x_{p}.$$`

These work by estimating each `$f_i$` using basis expansions in predictor `$i$`

The algorithm for fitting these things is called "backfitting":

.emphasis[    
1. Center `$\y$` and `$\X$`.
2. Hold `$f_k$` for all `$k\neq j$` fixed, and regress `$f_j$` on the partial residuals using your favorite smoother.
3. Repeat for `$1\leq j\leq p$`.
4. Repeat steps 2 and 3 until the estimated functions "stop moving" (iterate)
5. Return the results.
]

---

## Very small example

.pull-left[

```r
library(mgcv)
set.seed(12345)
n <- 500
simple <- tibble(
  x1 = runif(n, 0, 2*pi),
  x2 = runif(n),
  y = 5 + 2*sin(x1) + 8*sqrt(x2) + rnorm(n, sd = .25))
```

<img src="rmd_gfx/13-gams-trees/simple-data-1.svg" style="display: block; margin: auto;" />
]

.pull-right[

Smooth each coordinate independently

```r
ex_smooth <- gam(y ~ s(x1) + s(x2), data = simple)
# s(z) means "smooth" z (uses splines)
plot(ex_smooth, pages = 1, scale = 0, shade = TRUE, 
     resid = TRUE, se = 2, las = 1)
```

![](rmd_gfx/13-gams-trees/gam-mod-1.svg)

]

---

## Wherefore GAMs?

.pull-left[
If

`$\Expect{Y \given X=x} = \beta_0 + f_1(x_{1})+\cdots+f_p(x_{p}),$`

then

`$\textrm{MSE}(\hat f) = \frac{Cp}{n^{4/5}} + \sigma^2.$`

* Exponent no longer depends on `$p$`. Converges faster. (if truth separates like this)

* You could also use the same methods to include "some" interactions like

`$$\begin{aligned}&\Expect{Y \given X=x}\\ &= \beta_0 + f_{12}(x_{1},\ x_{2})+f_3(x_3)+\cdots+f_p(x_{p}),\end{aligned}$$`
]

.pull-right[

Smooth two coordinates together

```r
ex_smooth2 <- gam(y ~ s(x1, x2), data = simple)
plot(ex_smooth2, scheme = 2, scale = 0, shade = TRUE, 
     resid = TRUE, se = 2, las=1)
```

]

---

## Regression trees

Trees involve stratifying or segmenting the predictor space into a number of simple regions.

Trees are simple and useful for interpretation.

Basic trees are not great at prediction.

Modern methods that use trees are much better (Module 4)

Regression trees estimate piece-wise constant functions

The slabs are axis-parallel rectangles `$R_1,\ldots,R_K$` based on `$\X$`

In each region, we average the `$y_i$`'s: `$\hat\mu_1,\ldots,\hat\mu_k$`

Minimize `$\sum_{k=1}^K \sum_{i=1}^n (y_i-\mu_k)^2$` over `$R_k,\mu_k$` for `$k\in \{1,\ldots,K\}$`

This sounds more complicated than it is.

The minimization is performed __greedily__ (like forward stepwise regression).

---

![](https://www.aafp.org/dam/AAFP/images/journals/blogs/inpractice/covid_dx_algorithm4.png)

---

## Mobility data

```r
bigtree <- tree(Mobility ~ ., data = mob)
smalltree = prune.tree(bigtree, k = .09)
draw.tree(smalltree, digits = 2)
```

This is called the __dendrogram__

---

## Partition view

```r
mob$preds <- predict(smalltree)
par(mfrow = c(1, 2), mar = c(5, 3, 0, 0))
draw.tree(smalltree, digits = 2)
cols <- viridisLite::viridis(20, direction = -1)[cut(log(mob$Mobility), 20)]
plot(mob$Black, mob$Commute, pch = 19, cex = .4, bty = 'n', las = 1, col = cols, 
     ylab = "Commute time", xlab = "% Black")
partition.tree(smalltree, add = TRUE, ordvars = c("Black","Commute"))
```

We predict all observations in a region with the same value.
`$\bullet$` The three regions correspond to the leaves of the tree.

---

__Terminology__

We call each split or end point a node.  Each terminal node is referred to as a leaf.

The interior nodes lead to branches.

---

## Advantages and disadvantages of trees

.hand[
We'll talk more about trees next module for Classification.
]

---
class: middle
background-image: url("gfx/proforhobo.png")
background-size: cover

.pull-left[
.hand[.secondary[.larger[Next time...]]]

.hand[.secondary[.larger[Module]]] .huge-orange-number[3]

]

.pull-right[.center[
.fourth-color[.hand[.large[classification:

Professor or Hobo?

You decide.]]]

.small[.fourth-color[
Source: proforhobo.com
]]
]]