[,1] [,2]
[1,] 0 1000000
[2,] 10 0
Stat 406
Geoff Pleiss, Trevor Campbell
Last modified – 23 October 2024
So far — 0-1 loss. If correct class, lose 0 else lose 1.
Generalization: Asymmetric classification loss — If correct class, lose 0 else lose something.
E.g. MRI screening. Goal is “person OK”, “person has a disease”
Results in a 2x2 matrix of losses with 0 on the diagonal.
[,1] [,2]
[1,] 0 1000000
[2,] 10 0
Sometimes we output probabilities as well as class labels.
For example, logistic regression returns the probability that an observation is in class 1.
LDA and QDA produce probabilities as well. So do Neural Networks (typically)
(Trees “don’t”, neither does KNN, though you could fake it)
Suppose we predict some probabilities for our data, how often do those events happen?
In principle, if we predict
The same goes for the other probabilities. If we say “20% chance of rain” it should rain 20% of such days.
Of course, we didn’t predict exactly
n <- 250
dat <- tibble(
x = seq(-5, 5, length.out = n),
p = 1 / (1 + exp(-x)),
y = rbinom(n, 1, p)
)
fit <- glm(y ~ x, family = binomial, data = dat)
dat$phat <- predict(fit, type = "response") # predicted probabilities
binary_calibration_plot <- function(y, phat, nbreaks = 10) {
dat <- tibble(y = y, phat = phat) |>
mutate(bins = cut_number(phat, n = nbreaks))
midpts <- quantile(dat$phat, seq(0, 1, length.out = nbreaks + 1), na.rm = TRUE)
midpts <- midpts[-length(midpts)] + diff(midpts) / 2
sum_dat <- dat |>
group_by(bins) |>
summarise(
p = mean(y, na.rm = TRUE),
se = sqrt(p * (1 - p) / n())
) |>
arrange(p)
sum_dat$x <- midpts
ggplot(sum_dat, aes(x = x)) +
geom_errorbar(aes(ymin = pmax(p - 1.96 * se, 0), ymax = pmin(p + 1.96 * se, 1))) +
geom_point(aes(y = p), colour = blue) +
geom_abline(slope = 1, intercept = 0, colour = orange) +
ylab("observed frequency") +
xlab("average predicted probability") +
coord_cartesian(xlim = c(0, 1), ylim = c(0, 1)) +
geom_rug(data = dat, aes(x = phat), sides = "b")
}
Given a logistic regression output
E.g.
Do we send it to the spam folder (
So far we’ve been making the “decision”
i.e.
But maybe (for our application) a “better” decision is
roc <- function(prediction, y) {
op <- order(prediction, decreasing = TRUE)
preds <- prediction[op]
y <- y[op]
noty <- 1 - y
if (any(duplicated(preds))) {
y <- rev(tapply(y, preds, sum))
noty <- rev(tapply(noty, preds, sum))
}
tibble(
FPR = cumsum(noty) / sum(noty),
TPR = cumsum(y) / sum(y)
)
}
ggplot(roc(dat$phat, dat$y), aes(FPR, TPR)) +
geom_step(colour = blue, size = 2) +
geom_abline(slope = 1, intercept = 0)
Module 4
boosting, bagging, random forests, and neural nets
UBC Stat 406 - 2024