How to create and distribute R packages

Welcome!

  • We’re all affiliated to the Department of Statistics at UBC

Katie Burak, Assistant Professor of Teaching

Daniel Chen, Postdoctoral Teaching and Learning Fellow

G. Alexi Rodríguez-Arelis, Assistant Professor of Teaching

Tiffany Timbers, Associate Professor of Teaching
Figure 1: Teaching Team

What is an R package and when should I make one?

This question is key!

  • An R package is a collection of functions, tests, data, compiled code, and standardized documentation
  • The reason to create a package is mostly about creating code that should be easier to reuse and share

More specifically…

  • R packages are created:
    • When your DRY (“Do Not Repeat Yourself) radar goes off
    • When you think your code has general usefulness others could benefit from (this could include future you)
    • When you want to share data (R packages in particular are used quite heavily for this)

Let’s start with a motivating example

  • Suppose that, in many of your analyses, you find yourself repeatedly coding the distribution of observations over the classes (i.e., categories) of a given variable in your dataset
  • You find yourself rewriting code, copying and pasting code and/or copying and pasting files containing the code to do this

Moreover…

  • In doing so, you sometimes (or often!) make trivial mistakes
  • You wonder whether others doing similar analyses face this problem as well, and want to share your (eventual) solution with others

Counting observations across classes code example

  • The code below is counting how many cars in mtcars (composed of 32 observations) have 4, 6, and 8 cylinders
library(tidyverse, quietly = TRUE)

mtcars |>
  group_by(cyl) |>
  summarize(count = n()) |>
  rename("class" = cyl)
# A tibble: 3 × 2
  class count
  <dbl> <int>
1     4    11
2     6     7
3     8    14

Is an R package the answer to our pains with how we have been previously trying to reuse our code?


Yes, it is!

Packaging our code

  • Allows us to more easily reuse it in many different projects, in a less error-prone way
  • Lets others benefit from the code we have written
  • Increases our code quality because when packaging we:
    • must modularize our code into functions and write function reference documentation
    • have our code organized in a way that works well with tools for creating tutorials/vignettes, as well as formal code/software testing

How are R packages shared and downloaded?

CRAN is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R.

What will be the workshop topics?

  • Development of R packages via a toy example
  • Introduction to package testing
  • Setting up documentation
  • Introduction to continuous integration via GitHub Actions
  • Sharing and publishing packages on GitHub and CRAN
  • Definition of copyright rules
  • Choosing the most appropriate license