Lecture 11: Automation
November 13, 2025
This is an optional topic with no in-class lecture!
From this lecture, students are anticipated to be able to:
Use
maketo record which files are inputs vs. intermediates vs. outputsUse
maketo capture how scripts and commands convert inputs to outputsUse
maketo re-run parts of an analysis that are out-of-dateWrite a Makefile.
Interact with
makein RStudio.Use
makefrom the Terminal.
Video Lecture
Why Automation
Recall the reproducibility principle where our analysis should be easily replicated by others to verify the results. While we can create documentation that tells users how to rerun the code and set up a project for analysis, this involves some manual work which can lead to human error (and can be time consuming!). Instead of doing things by hand, we can (and should) automate processes.
Consider an analysis where we need to clean, summarize, plot, and model some data. You can think of each of these tasks as a separate part of the research process, often referred to as a pipeline (a system where the code for some tasks depend on the output of others). For example, if I were to make a change in the data cleaning chunk, then that will likely affect the summary, plots, and models (i.e., the “downstream” processes).
Using a make file can make this process more streamlined through automation. One of the major advantages to this: you no longer have to re-run all of the code every time you make a change. You only need to run the parts downstream from what you changed. How convenient!
We will begin by setting up make (see stat545.com Chapter 35) and then work through the demonstration from stat545.com Chapter 36 together.
Both of these tasks will be outlined in the Video Lecture linked above.