Lecture 11: Automation

November 13, 2025

Modified

November 7, 2025

This is an optional topic with no in-class lecture!

From this lecture, students are anticipated to be able to:

Use maketo record which files are inputs vs. intermediates vs. outputs
Use make to capture how scripts and commands convert inputs to outputs
Use make to re-run parts of an analysis that are out-of-date
Write a Makefile.
Interact with make in RStudio.
Use make from the Terminal.

Video Lecture

Why Automation

Recall the reproducibility principle where our analysis should be easily replicated by others to verify the results. While we can create documentation that tells users how to rerun the code and set up a project for analysis, this involves some manual work which can lead to human error (and can be time consuming!). Instead of doing things by hand, we can (and should) automate processes.

Consider an analysis where we need to clean, summarize, plot, and model some data. You can think of each of these tasks as a separate part of the research process, often referred to as a pipeline (a system where the code for some tasks depend on the output of others). For example, if I were to make a change in the data cleaning chunk, then that will likely affect the summary, plots, and models (i.e., the “downstream” processes).

Using a make file can make this process more streamlined through automation. One of the major advantages to this: you no longer have to re-run all of the code every time you make a change. You only need to run the parts downstream from what you changed. How convenient!

We will begin by setting up make (see stat545.com Chapter 35) and then work through the demonstration from stat545.com Chapter 36 together.

Both of these tasks will be outlined in the Video Lecture linked above.