library(tidyverse)
library(here)
Lecture 7A: Reading and Writing Data
October 14, 2025
Reading and Writing Data in R
From today’s class, students are anticipated to be able to:
Read and write a delimited file, like a csv, from R using the
readr
package.Make relative paths using the
here::here()
function.Read data from a spreadsheet
Read and write R binary files (rds files) from R.
Required packages:
In some of the eariler examples, I will be demonstrating using absolute and relative file paths. On Mac OS, files are written with forward slashes (i.e., datasets/penguins.csv). On Windows, backslashes are used (i.e., datasets\penguins.csv). My examples are writted for MacOS - the only thing you’d need to change in the examples are the direction of the slashes if you plan to reproduce my code!
Later in the lecture, we discuss the here::here()
function which solves this problem completely.
Common file formats include
Spreadsheets: Excel, Google Sheets, Numbers
Delimited files: Plaintext files containing data, e.g. comma separated values (CSVs), tab separated values (TSVs)
R binary: A serialization of an R object to a binary file. Basically, that means that it can be loaded in and out of R, but it can’t be opened by anything but R.
CSVs are the most “one-size-fits-all”: you can open them in spreadsheet software, but they are also plaintext, so are lightweight, can be opened in any text editor, and can be “diff”ed. Spreadsheets are nice for human interaction (like through Excel), but can be clunky in R. R binary is quite restrictive and we don’t tend to store data this way. Our lecture will focus on CSVs.
Comma Separated Values (CSVs)
Jenny Bryan’s website has a fabulous section on reading and writing files in R. We’re going to summarize a few of the important functions here, but if you’d like to learn more then check out that website for more in-depth explorations!
We are going to focus on reading and writing data using the readr
package, because we think it has the most “work right out of the box” experience.
We will start by talking about how to read and write Comma Separated Value (CSV) files (files that end in .csv). CSVs are often used to store data. When the penguins
data set is stored as a .csv, the first few entries look like when opened as a text file:
species,island,bill_len,bill_dep,flipper_len,body_mass,sex,year
Adelie,Torgersen,39.1,18.7,181,3750,male,2007 Adelie,Torgersen,39.5,17.4,186,3800,female,2007 Adelie,Torgersen,40.3,18,195,3250,female,2007 Adelie,Torgersen,NA,NA,NA,NA,NA,2007 Adelie,Torgersen,36.7,19.3,193,3450,female,2007 Adelie,Torgersen,39.3,20.6,190,3650,male,2007 Adelie,Torgersen,38.9,17.8,181,3625,female,2007 Adelie,Torgersen,39.2,19.6,195,4675,male,2007
Now, this isn’t exactly easy for humans to read, but saving data as CSVs has its advantages. The data is stored in a simple form (lightweight - files aren’t large) that has broad compatibility and can be used in a wide range of applications. And of course, we can use functions in R to make it more readable. A few main functions of note are
read_csv()
: tidyverse equivalent ofread.csv()
used to read from a CSV to a tibblewrite_csv()
: tidyverse equivalent ofwrite.csv()
used to export a tibble into CSV format
Let’s assume that a file called penguins.csv
is saved in a dataset
folder in our current directory. We can read in, and save the tibble as a variable called penguins
using:
<- read_csv("datasets/penguins.csv") penguins
Rows: 344 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): species, island, sex
dbl (5): bill_len, bill_dep, flipper_len, body_mass, year
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(penguins)
# A tibble: 6 × 8
species island bill_len bill_dep flipper_len body_mass sex year
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 female 2007
3 Adelie Torgersen 40.3 18 195 3250 female 2007
4 Adelie Torgersen NA NA NA NA <NA> 2007
5 Adelie Torgersen 36.7 19.3 193 3450 female 2007
6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
Pretty easy! Note that the file path needs to be a string, relative to where you are now in the directory (i.e., where the R script you’re working on is saved. You can always call getwd()
to see what directory you’re working on currently, and we’ll show more tools for dealing with directories later in this lecture.)
We can manipulate the data, and save the output as a new CSV. For example,
<- penguins %>%
penguins_2007 filter(year == 2007) #filter only on year 2007
write_csv(penguins_2007, "datasets/penguins_2007.csv") #save csv in datasets folder, name as penguins_2007.csv
Now, in my datasets
folder, I have penguins
and penguins_2007.
Want to read and write to an Excel file? The readxl
package in the tidyverse is for you!
For the very niche option of R binary: read_rds()
and write_rds()
.
Using Relative Paths
As previously mentioned, we need to specify where we are reading/writing our data from/to. The best practice is to use a relative path. This helps with reproducibility and automation!
I could always read in my penguins
dataset using an absolute file path where the file path begins at the root of your computer (for me, it is a long chain of folders in /Users/......../STAT545/stat454.github.io/webpages/lectures_i/datasets/penguins.csv
). This, however, is not best practice. As previously mentioned, this string telling me where the file path is can also differ for Windows users! I’d have to manually change all of the forward slashes to backslashes to make this run on Windows.
We will use the here
package for relative file paths. Let’s (install, if necessary, and) load it and call the function here()
# install.packages("here")
library(here)
here()
[1] "/Users/gracetompkins/Library/CloudStorage/OneDrive-UBC/Teaching/STAT545/STAT545.github.io"
I get that long chain of folders where this R Project (which I used to build this website) is stored. The cool thing about here
is that I can specify a file path relative to my project root (the above location) without using any operating system-specific strings.
For example, the penguins.csv
data set is located in webpages
> lectures_i
> datasets
within my R project folder. I can access it by:
<- read_csv(here("webpages", "lectures_i", "datasets", "penguins.csv")) penguins
Rows: 344 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): species, island, sex
dbl (5): bill_len, bill_dep, flipper_len, body_mass, year
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(penguins) #view first few entries of the tibble
# A tibble: 6 × 8
species island bill_len bill_dep flipper_len body_mass sex year
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 female 2007
3 Adelie Torgersen 40.3 18 195 3250 female 2007
4 Adelie Torgersen NA NA NA NA <NA> 2007
5 Adelie Torgersen 36.7 19.3 193 3450 female 2007
6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
This is reproducible!
Resources
- Video lecture: Reading and Writing Data
- The “Writing and Reading files” chapter of stat545.com.