STAT 545 - Fall 2025
create effective data visualizations
explore the grammar of ggplot2
Quick links:
Install (if necessary) and load in the following packages:
We want to investigate if there is a relationship between the number of cylinders (cyl), fuel efficiency (mpg), and horsepower (hp) in the mtcars data set.
We can clearly see that cars with higher horsepower tend to be less fuel efficient (have lower miles per gallon).
Cars with higher horsepower also tend to have a higher number of cylinders.
That was a lot easier to deduce than when we were reading numbers off of a large table. We will show how to build this exact plot in this lecture!
Data visualizations are often preferred to aid with identify patterns and relationships and emphasizing important findings in a research projects
So, what’s the best visualization?
Once you know what you want to show, we need to convey the message as clearly and simply as possible
Make it clear as a “standalone”
Minimize noise
Don’t use colours randomly - it may cause us to see false patterns
Don’t overlap plots
Use as few plots as possible, but don’t be afraid to separate data to plot if it makes things clearer
Consider the accessibility of your chart
up to 10% of the population has colourblindness
make it low-vision friendly and provide descriptions where possible
ggplot2 and the Grammar of Graphicsggplot2 is based on the grammar of graphics, which is a systematic approach for describing different components or aspects of a graph. It involves seven components (required components are indicated with the *):
Data*
Aesthetic mappings*
Geometric objects*
* = required component
ggplot2 and the Grammar of GraphicsScales
Statistical transformations
Coordinate system
Facet
It’s okay if you don’t quite understand all of these components yet. We will walk through examples of commonly used plots and discuss which components are necessary!
mtcarsLet’s say you’d like to see if there is a relationship between miles per gallon and horsepower of cars in the mtcars data set. As both of these variables are numeric, we could build a scatterplot as we are comparing two quantitative/numeric variables.
To Print to PDF (with your code!), do the following:
Open the in-browser print dialog CMD/CTRL + P or Right click > Print
Change the Destination setting to Save as PDF.
Change the Layout to Landscape.
Change the Margins to None. Enable the Background graphics option.
Click Save 🎉
In all cases, ensure the code chunks don’t get cut off (i.e., a line is too long). You will not be able to edit these from the PDF directly (though you could open the slides again and copy-and-paste to re-run the code!)