R packageAttribution: This content has been developed on the basis provided by Chapter 1: The Whole Game (
Rpackages book by Hadley Wickham & Jenny Bryan, 2e) and the UBC course notes Reproducible and Trustworthy Workflows for Data Science by Tiffany Timbers, Joel Östblom, Florencia D’Andrea, and Rodolfo Lourenzutti
We assume you have followed the installation instructions we shared before the workshop and have: registered for a GitHub account and installed git (more information here)
R packagesR packagesR packagescreate_package()create_package() will initialize our new package in a directory of our choiceDesktop folder for easier referenceDon’ts when choosing your home directory
R package, or Git repositoryR package library (i.e., where we usually install other packages from CRAN)ignore-type files).gitignore is used by GitHub and lists all “hidden” files created by R and RStudio that aren’t necessary for the repository.Rbuildignore contains all files created via R and RStudio that won’t be necessary when building our package (e.g., eda.Rproj)DESCRIPTION contains the metadata and dependency installation instructions for our packageeda.Rproj is the RStudio project fileNAMESPACE contains the package’s functions to export along with imports from other packagesR/ directory which will contain all package’s functions as .R scriptsuse_git()eda.Rproj, we will initialize a Git repository via use_git().git directory in the folder {eda}eda.RprojGit tab, click on the clock icon to check your commit history (note your GitHub user is shown in the Author column)mtcars) so we (and others) can reuse this code more easily in other projectscount_classes()data_frame or data frame extension (e.g., a tibble) along with an unquoted column name containing the class label class_col{ }{ class_col }class_col) needs extra support (via the curly brackets) because the global environment is not aware of the data frame column namespackage::function()count_classes() includes four {dplyr} functions:
group_by(), summarize(), n(), and rename()dplyr::group_by(), dplyr::summarize(), dplyr::n(), and dplyr::rename()count_classes <- function(data_frame, class_col) {
if (!is.data.frame(data_frame)) {
stop("`data_frame` should be a data frame or data frame extension (e.g. a tibble)")
}
data_frame |>
dplyr::group_by({{ class_col }}) |>
dplyr::summarize(count = dplyr::n()) |>
dplyr::rename("class" = {{ class_col }})
}use_r().R script in the R/ subdirectory of {eda}use_r() creates the .R script count_classes.RGit tab keeps track of all our changes in the repository after our initial commitcount_classes.RGit tab, check the box in column StagedCommit buttonStage columnAdd count_classes()Commit buttonload_all()count_classes()load_all() from {devtools} makes function count_classes() available for experimentationmtcars and column cyldata_frame and class_col, respectivelyREADME.md file later)Terminal tab of our RStudio session, we will paste the Git commands shown by GitHub.com from section ...or push an existing repository from the command linecheck()R add-on package work correctlycheck() executes R CMD check in the shell (i.e., terminal)check() from {devtools} via the R Consolecheck() output)DESCRIPTIONDESCRIPTION filePackage: eda
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R:
person("First", "Last", , "first.last@example.com", role = c("aut", "cre"),
comment = c(ORCID = "YOUR-ORCID-ID"))
Description: What the package does (one paragraph).
License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a
license
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
DESCRIPTION template might look different if you followed the instructions in system setup from here: https://r-pkgs.org/setup.html#personal-startup-configurationTitle, Authors@R, and DescriptionORCID, you can delete comment = c(ORCID = "YOUR-ORCID-ID")Package: eda
Title: A Package for Data Wrangling
Version: 0.0.0.9000
Authors@R:
person("G. Alexi", "Rodriguez-Arelis", , "alexrod@stat.ubc.ca", role = c("aut", "cre"))
Description: Provide data wrangling and summary functions to conduct a proper
exploratory data analysis.
License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a
license
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
DESCRIPTION file, we need to save our changescount_classes() function, let’s locally commit these changes (use the commit message Edit DESCRIPTION)Git tab in RStudio, we will remotely push our edits to our public repository on GitHub by clicking on the Push buttonuse_mit_license()check(), we need to include a LICENSE.mduse_mit_license() from {usethis} via the R ConsoleLICENSE.md look like?Note: More about license matters later on in this workshop
DESCRIPTION file!License field gets updated as follows:Package: eda
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R:
person("G. Alexi", "Rodriguez-Arelis", , "alexrod@stat.ubc.ca", role = c("aut", "cre"))
Description: Provide data wrangling and summary functions to conduct a proper
exploratory data analysis.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
Use MIT license)document()count_classes() function via package {roxygen2}R/count_classes.R in the source editorCode > Insert roxygen skeletoncount_classes()#' Count class observations
#' Creates a new data frame with two columns,
#' listing the classes present in the input data frame,
#' and the number of observations for each class.
#'
#' @param data_frame A data frame or data frame extension (e.g. a tibble).
#' @param class_col Unquoted column name of column containing class labels.
#'
#' @return A data frame with two columns.
#' The first column (named class) lists the classes from the input data frame.
#' The second column (named count) lists the number of observations for each class from the input data frame.
#' It will have one row for each class present in input data frame.
#' @export
#'
#' @examples
#' count_classes(mtcars, cyl)
R/count_classes.RAdd roxygen header to document count_classes())document() from {devtools}document() function in the R Consoleman/count_classes.Rd in {eda}, which is the help we get when typing ?count_classes in the R Consoledocument()Run document())check() againLICENSE.md in {eda}, let’s use check() again in the R Console to ensure the license-related warning is goneinstall()install.packages() as with any package in the CRAN, we will use install() from {devtools}install() installs a local package in the current working directory, whereas install.packages() installs from a package repositoryR consoleuse_package()count_classes() uses functions from package {dplyr}use_package() from {usethis}DESCRIPTION, more specifically in ImportsR consoleImport dplyr)use_readme_rmd()README.md file describing the package, installation, and usageuse_readme_rmd() from {usethis}.Rmd template, which we have to fill out.Rmd file.md, use build_readme(), commit, and push these changes to the remote repository (use the commit message Write README.Rmd and render)check() and install()R package!check() (to ensure all warnings are gone!), and then re-build via install()test() will be covered later on) via the below diagram from Chapter 1: The Whole Game (R packages book by Hadley Wickham & Jenny Bryan, 2e)