Lecture 2A: Reproducibility with RMarkdown
September 9, 2025
Learning Objectives
From today’s class, students are anticipated to be able to:
Use basic markdown features.
Write documents in markdown.
Choose whether html or pdf is an appropriate output.
Style an .Rmd document by editing the YAML header.
Customize code chunk output using Rmd code chunk.
Render your finalized document to HTML & PDF.
Video Lecture
Reproducibility
Reproducibility is the ability of an independent researcher to repeat an experiment using the same data and workflow to obtain the same results [1]. One way to ensure reproducible research is sharing clear details on the analysis and providing the code used to produce the results. Reproducibility is often confused with replication. However, replication is a separate concept where the results of a study should be validated on an independent study using new data. A “good” study is both replicable and reproducible.
There are also ethical benefits to reproducible research [1]. Performing open research can reduce the chance of data fabrication or other ethical issues such as p-hacking, where a researcher tests a number of hypotheses until they find one that is statistically significant. Open, transparent research poses a sense of accountability on the researcher(s). Of course, you can’t always share data (for example, health data containing personal identifying information), but you should share as much as is possible. Always check with your ethics board or principal investigator before sharing data online.
Reproducible studies can also advance research by providing the code used for analysis. Not only can we reproduce the study to make the findings more trustworthy, but we can learn more about how the analysis was performed and possibly apply it to other studies.
Reproducible research can also force you to have better, more automated workflows. The first analysis I ever did in R, I was manually changing things in excel documents, and then saving them in a certain place, and then using R to fit a model, and then exporting the data, and then changing the worksheet format, and so on. I had a sticky note of instructions on how to produce the results on my desk. This was neither reproducible nor productive for my time. While automating some steps and using tools like GitHub for version control can be more work upfront, it can save you a lot of headaches down the road. It is something I personally wish I learned to use earlier in my career.
R Markdown
Using an editor like MS Word is like painting: you decorate the page with text, graphs, and tables, making sure things are positioned, sized, and coloured appropriately.
This is great for a letter to a friend, but is less great for scientific work, because it hampers reproducibility and shareability.
R Markdown lets you write a single “blueprint” for your analysis that includes everything - positioning/sizing/colouring/formatting, analysis/graph/table code and results, and text – and then “knit” all of those components together into a complete report with a single button press.
There is a large community of R Markdown users, making it easy to find tutorials and blog posts about a ton of different types of documents. We’re going to start simple and dive right into R Markdown.
Getting Started with Markdown
In lecture, we are going to work through this online Markdown tutorial together in small groups. Challenge yourself to meet someone new in the class!
After completing the tutorial, try making your own R Markdown document!
In RStudio, go to “File” -> “New File” -> “Markdown File”
Write a Markdown document all about you that includes:
A header
A link to your GitHub profile
A list of courses you’re taking this semester
Click the “Preview” button to generate an output .html file from the source .md file.
Install RMarkdown
To get started with using R Markdown, you’ll need to install the
rmarkdown
andDT
R packages package.install.packages('rmarkdown') install.packages('DT')
Create an R Markdown file by going to File > New File > R Markdown..
- Change the title to “STAT545 Lecture 2a” and save the defaults. Click OK.
Click the “Knit” button to generate the default document.
Things to notice:
The YAML header is contained between two — at the top of the .Rmd source, and contains metadata on the document. This is where you specify the output type to be HTML.
Text is formatted using Markdown. There are three chunks of R code, and knitting executes the R code and displays the output in the output file.
How does it all work?
The key drivers under the hood are knitr
and Pandoc
! When you press “Knit”, R Markdown passes the .Rmd
file to knitr
, which executes all of the code and creates a new .md
file including the code and output. Then, that .md
file is processed into the final output format (e.g. .html
) by pandoc.
What if I want a PDF?
You’ll need to install LaTeX and set the document type as a PDF document. TinyTeX is recommended:
install.packages('tinytex')
::install_tinytex() # install TinyTeX tinytex
For this lecture, we will be knitting to HTML.
Code Chunk Options
As we saw in the demo, we can add code chunks using by selecting “Code” > “Insert Chunk”, or use a keyboard shortcut: cmd + option + I
(MAC) / ctrl + alt + i
(WINDOWS). We can also toggle to Visual view to add a code chunk using the “Insert” > “Executable Cell” option.
Just like YAML is metadata for the Rmd document, code chunk options are metadata for the code chunk. Specify them within the {r}
at the top of a code chunk, separated by commas. For a list of chunk options, check out Yihui Xie’s knitr book. Let’s try some:
Hide the code from the output with
echo = FALSE.
Change the figure width and height with
fig.width = 5
andfig.height = 3
.Knit the results. Can you spot the differences?
Resources
Here are a number of resources that may help supplement you with today’s lesson:
The Official R Markdown Tutorial from the “Introduction” up to and including the “Inline Code” section.
Many cheat sheets can be found from RStudio: go to “Help” -> “Cheatsheets”.