Stat 406
Geoff Pleiss, Trevor Campbell
Last modified – 11 September 2024
Assume that
How would I create data from this model (draw a sample)?
Set up constants
Create the data
epsilon <- rnorm(n, sd = sigma) # this is random
X <- matrix(runif(n * p), n, p) # treat this as fixed, but I need numbers
beta <- (p + 1):1 # parameter, also fixed, but I again need numbers
Y <- cbind(1, X) %*% beta + epsilon # epsilon is random, so this is
## Equiv: Y <- beta[1] + X %*% beta[-1] + epsilon
Why squared errors?
Why not absolute errors
We write this as
Find the
which minimizes the sum of squared errors.
Note that this is the same as
Find the beta which minimizes the mean squared error.
We differentiate and set to zero
…this is
The
AKA, the SSE.
Method 2 didn’t use anything about the distribution of
But if we know that
So the probability density of
In probability courses, we think of
…instead, think of it as a function of
We call this “the likelihood” of beta:
Given some data, we can evaluate the likelihood for any value of
It won’t integrate to 1 over
But we can maximize it with respect to
The derivative of
I claim we can maximize
We can also throw out the constants. (Why?)
So…
The same as before!
(Not in real time)
UBC Stat 406 - 2024