Lecture 6a: Model Fitting

STAT 545 - Fall 2025

Learning Objectives

From today’s class, students are anticipated to be able to:

  • make a model object in R, using lm() as an example.

  • write a formula in R.

  • predict on a model object with the broom::augment() and predict() functions.

  • extract information from a model object using broom::tidy(), broom::glance(), and traditional means.

Note that there is a new tidyverse-like framework of packages to help with modelling. It’s called tidymodels.

YouTube Video

Lecture Notes:

Set-up

The following packages will be used throughout this lecture:

Model Fitting in R

We cannot rely only on plotting to:

  • Investigate the relationship between two or more variables

  • Predict the outcome of a variable given information about other variables

These typically involve fitting models.



NOTE: This lecture is not about the specifics of fitting a model. Even though a few references to statistical concepts are made, just take these for face value. The focus of this lecture will be on how we fit models in R.

Notes:

Example: Linear Model

  • Describes a relationship between:
    • an outcome y, and
    • variable(s) (or covariate(s), predictor(s), dependent variable(s)) x.
  • Like a line of best fit
  • Can be used for both inference and prediction

Example: Linear Model

Here are a couple questions that a linear model could be used to answer:

  1. What is the predicted fuel efficiency (mpg) for a 4000lb car?

  2. Does the weight of a car influence its mpg?

  3. How many more miles per gallon can we expect of a car that weighs 1000 lbs less than another car?

Notes:

Example: Linear Model

A simple scatterplot will give us a general idea, but can’t give us specifics. Here, we use the mtcars dataset in the datasets package. A linear model is one example of a model that can attempt to answer all three – the line corresponding to the fitted model has been added to the scatterplot.

Notes:

Fitting a Model in R

Fitting a model in R typically involves using a function in the following format: method(formula, data, options)

Method: A function such as:

  • Linear Regression: lm

  • Generalized Linear Regression: glm

  • Local regression: loess

Formula: In R, takes the form y ~ x1 + x2 + ... + xp (use column names in your data frame), where y here means the outcome variable you’re interested in viewing in relation to other variables x1, x2, …

Data: The data frame or tibble. (can omit, if variables in the formula are defined in environment)

Options: Specific to the method, and include ways to customize the model.

Fitting a Model in R

Running the code returns an object – usually a special type of list – that you can then work with to extract results.

Example: linear model of car’s mileage per gallon (mpg, “Y” variable) and the car’s weight (wt, “X” variable).

my_lm contains a lot more than what’s printed! Let’s explore it.

Summarizing the Model with broom

Now that you have the model object, there are typically three ways in which it’s useful to probe and use the model object. The broom package has three crown functions that make it easy to extract each piece of information by converting your model into a tibble:

  1. tidy: extract statistical summaries about each “component” of the model.

  2. augment: add columns to the original data frame containing predictions.

  3. glance: extract statistical summaries about the model as a whole (a 1-row tibble).

Summarizing the Model with broom

tidy()

Use the tidy() function for a statistical summary of each component of your model, where each component gets a row in the output tibble. For lm(), tidy() gives one row per regression coefficient (slope and intercept).

tidy() only works if it makes sense to talk about model “components”.

Summarizing the Model with broom

augment()

Use the augment() function to make predictions on a dataset by augmenting predictions as a new column to your data. By default, augment() uses the dataset that was used to fit the model.

We can also predict on new datasets. Here are predictions of mpg for cars weighing 3, 4, and 5 thousand lbs. In the following code, we make a predictor for wt = 3, 4, and 5.

Notice that only the “X” variables are needed in the input tibble (wt), and that since the “Y” variable (mpg) wasn’t provided, augment() couldn’t calculate anything besides a prediction

Summarizing the Model with broom

glance()

Use the glance() function to extract a summary of the model as a whole, in the form of a one-row tibble. This will give you information related to the model fit.

Summarizing the Model Without broom

In order for a model to work with the broom package, someone has to go out of their way to contribute to the broom package for that model. While this has happened for many common models, many others remain without broom compatibility.

Here’s how to work with these model objects in that case.

Summarizing the Model Without broom: Prediction

If broom::augment() doesn’t work, then the developer of the model almost surely made it so that the predict() function works (not part of the broom package). The predict() function typically takes the same format of the augment() function, but usually doesn’t return a tibble.

Here are the first 5 predictions of mpg on the my_lm object, defaulting to predictions made on the original data:

Summarizing the Model Without broom: Prediction

Here are the predictions of mpg made for cars with a weight of 3, 4, and 5 thousand lbs:

Checking the documentation of the predict() function for your model isn’t obvious, because the predict() function is a “generic” function. Your best bet is to append the model name after a dot. For example:

  • For a model fit with lm(), try ?predict.lm

  • For a model fit with rq(), try ?predict.rq (from the quantreg package)

If that doesn’t work, just google it: "Predict function for rq"

Summarizing the Model Without broom: Inference

We can extract model information using summary() on the linear model:

There’s a lot of information here. To extract a specific piece, we use $. For example,

This outputs a data frame of some of the model output regarding the regression coefficients.

Summarizing the Model Without broom: Inference

There are a ton of other items we can access, and to see them we can call names()

For another example, wo see the adjusted R-squared valued of the model we can use

Example: gapminder

Let’s visualize some relationships in the gapminder dataset.

Let’s inspect Zimbabwe, which has a unique behavior in the lifeExp and year relationship.

Example: gapminder

Now, let’s try fitting a linear model to this relationship

Notes:

Example: gapminder

Now we will try to fit a second degree polynomial and see what would that look like.

Notes:

Summary

While this lecture only focused on two-dimensonal data (one y and one x), more sophisticaed statistical models can certainly handle more complexity and higher-dimensional data. If you’re interested, head to the Resources section at the end of this lecture notes to learn more.

  1. function(formula, data, options) - most models in R follow this structure.

  2. broom::augment() - uses a fitted model to obtain predictions. Puts this in a new column in existing tibble. Equivalent base-r function is predict().

  3. broom::tidy() - used to extract statistical information on each component of a model. Equivalent is coef(summary(lm_object)).

  4. broom::glance() - used to extract statistical summaries on the whole model. Always returns a 1-row tibble.

  5. geom_smooth() - used to add geom_layer that shows a fitted line to the data. method and formula arguments can be used to customize model.

Worksheet A4

We will now spend some time attempting questions on the last part of Worksheet A4.