Lecture 7A: Reading and Writing Data

October 14, 2025

Modified

October 24, 2025

Learning Outcomes

From today’s class, students are anticipated to be able to:

Read and write a delimited file, like a csv, from R using the readr package.
Make relative paths using the here::here() function.
Read data from a spreadsheet
Read and write R binary files (rds files) from R.

Lecture Slides

Lecture 7 (A and B) - Reading, Writing, and Joining Tibbles Slides

Set-up

Required packages:

library(tidyverse)
library(readr)
library(here)

Data Formats

Data has to be stored somewhere. When saving data locally, common file formats include

Spreadsheets: Excel (.xlsx), Google Sheets (.gsheet)
Delimited files: Plaintext files containing data, e.g., text files (.txt), comma separated values (.csv), tab separated values (.tsv)
R binary: A serialization of an R object to a binary file (.rds). Basically, that means that it can be loaded in and out of R, but it can’t be opened by anything but R.

CSVs are the most “one-size-fits-all”: you can open them in spreadsheet software, but they are also plaintext under the hood, meaning they are lightweight (don’t take a lot of storage) and can be opened in any text editor.

Spreadsheets are nice for human interaction (like through Excel), but can be clunky in R and often use more memory to store due to their extra features.

R binary can be useful for storing results that you don’t want to rerun in R, but it is not as useful for storing raw data. The R binary data type is quite restrictive and we don’t tend to store data this way. Our lecture will focus on CSVs.

Comma Separated Values (CSVs)

Jenny Bryan’s website has a fabulous section on reading and writing files in R. We’re going to summarize a few of the important functions here, but if you’d like to learn more then check out that website for more in-depth explorations!

We will start by talking about how to read and write Comma Separated Value files. CSVs are often used to store data. When the penguins data set is stored as a .csv, the first few entries look like when opened as a text file (see for yourself here):

species,island,bill_len,bill_dep,flipper_len,body_mass,sex,year

Adelie,Torgersen,39.1,18.7,181,3750,male,2007 Adelie,Torgersen,39.5,17.4,186,3800,female,2007 Adelie,Torgersen,40.3,18,195,3250,female,2007 Adelie,Torgersen,NA,NA,NA,NA,NA,2007 Adelie,Torgersen,36.7,19.3,193,3450,female,2007 Adelie,Torgersen,39.3,20.6,190,3650,male,2007 Adelie,Torgersen,38.9,17.8,181,3625,female,2007 Adelie,Torgersen,39.2,19.6,195,4675,male,2007

Now, this isn’t exactly easy for humans to read, but saving data as CSVs has its advantages. The data is stored in a simple form (lightweight - files aren’t large) that has broad compatibility and can be used in a wide range of applications. And of course, we can use functions in R to make it more readable. A few main functions of note, which are from the readr package, are:

read_csv(): tidyverse equivalent of read.csv() used to read from a CSV to a tibble
write_csv(): tidyverse equivalent of write.csv() used to export a tibble into CSV format

Let’s assume that a file called penguins.csv is saved in the same folder as our code. We can read in, and save the tibble as a variable called penguins using:

penguins <- read_csv("penguins.csv")

Rows: 344 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): species, island, sex
dbl (5): bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(penguins)

# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650
# ℹ 2 more variables: sex <chr>, year <dbl>

Pretty easy! Note that the file path needs to be a string, relative to where you are now in the directory (i.e., where the R script you’re working on is saved. You can always call getwd() to see what directory you’re working on currently, and we’ll show more tools for dealing with directories later in this lecture.)

We can also manipulate the data, and save the output as a new CSV. For example,

penguins_2007 <- penguins %>%
  filter(year == 2007) #filter only on year 2007

write_csv(penguins_2007, "penguins_2007.csv") #save new data as penguins_2007.csv

Note

Want to read and write to an Excel file? The readxl package in the tidyverse is for you!

For the very niche option of R binary: read_rds() and write_rds().

File Locations and Paths

In the previous example, we saved and read in data that was stored in the same folder. However, we will often want to read from or write to other locations, including sub-folders in our project.

To do so, we need to specify where we are reading/writing our data from/to.

Absolute Paths

Absolute paths start with “/” (or “\” for Windows users) and begin at the root of your computer. This is a looooong set of “directions” that tell you where the file is located.

I could always read in my penguins dataset using an absolute file path where the file path begins at the root of your computer. Consider the following file structure:

The absolute path to the penguins.csv data set is /Users/grace/documents/STAT545/Lec7A/datasets/penguins.csv. Note the “/” (or “\”) at the beginning of the string indicates that you start at the root of your folder. This will work to load in the data. However, it is not best practice in terms of reproducibility. If I moved my project folder anywhere else in my computer, or sent this code to someone else to read in the data, this long file path string would have to be updated.

Important

Because I wrote this on a Mac, the slashes are forward “/”. Windows users write file paths with back slashes ““.

Later in the lecture, we discuss the here::here() function which solves this problem completely.

Relative Paths

The best practice is to use a relative path. This helps with reproducibility and automation!

Instead of starting at the root of your computer, you can give directions to the file you want to load in relative to the working directory (i.e., where you are now).

If we are working in the Lec7A directory on mycode.R, all we need to do inorder to access penguins.csv is go into the datasets folder (which is in our working directory) and load it in! The relative file path datasets/penguins.csv (note there is no back or forward slash at the beginning of the filepath). This means if I move my Lec7A folder, or share it with someone else, anyone can load in the data with this line of code (well, almost…. so long as they have the same operating system!)

If you’re having trouble visualizing the working directory, you could consider the folders nested this way as well:

Some useful tips for relative paths:

they do not start with a slash
. represents the current directory
.. means go to one folder before the current directory (open the parent folder)
- i.e., to go to the thesis folder if my current working directory is Lec7a, the path is ..\..\thesis (leave the Lec7a folder to go to the STAT545 folder, then leave the STAT545 folder to go to documents, then go to the thesis).
you can call getwd() in R to confirm where your working directory is (it will show the absolute file path as the output)
in R projects, by default your working directory is you R project folder.

The `here` Package

As we stated before, things can get frustrating when sharing files between operating systems. Even with relative paths, we’ll need to manually replace forward and back slashed when switching to/from Mac and Windows operating systems.

Thankfully, there is a package that allows us to use relative paths without specifying a filepath string that is operating system dependent. Let’s (install, if necessary, and) load the here package

# install.packages("here")
library(here)

Now, let’s call here():

here::here()

[1] "/Users/gracetompkins/Desktop/STAT545.github.io"

Side note: we will explicitly call here() from the here package using here:: as dplyr also has a here() function.

I get a long chain of folders where this R Project (which I used to build this website) is stored. The cool thing about here is that I can specify a file path relative to my project root (the above location) without using any operating system-specific strings.

For example, the penguins.csv data set is located in webpages > lectures_i > datasets within my R project folder. I can access it by:

penguins <- read_csv(here("webpages", "lectures_i", "datasets", "penguins.csv"))

Rows: 344 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): species, island, sex
dbl (5): bill_len, bill_dep, flipper_len, body_mass, year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(penguins) #view first few entries of the tibble

# A tibble: 6 × 8
  species island    bill_len bill_dep flipper_len body_mass sex     year
  <chr>   <chr>        <dbl>    <dbl>       <dbl>     <dbl> <chr>  <dbl>
1 Adelie  Torgersen     39.1     18.7         181      3750 male    2007
2 Adelie  Torgersen     39.5     17.4         186      3800 female  2007
3 Adelie  Torgersen     40.3     18           195      3250 female  2007
4 Adelie  Torgersen     NA       NA            NA        NA <NA>    2007
5 Adelie  Torgersen     36.7     19.3         193      3450 female  2007
6 Adelie  Torgersen     39.3     20.6         190      3650 male    2007

This is reproducible!

Exercise

Open RStudio. Go to Session => Set Working Directory => Choose Directory and then pick a folder you would like to read and write data into. Then, run the following piece of code in a new R Script:

library(tidyverse) 
library(gapminder)

gap_asia_2007 <- gapminder %>% 
  filter(year == 2007, continent == "Asia")
head(gap_asia_2007)

Write gap_asia_2007 to a comma-separated value (csv) file named exported_file.csv with just one command:

write_csv(FILL_THIS_IN, "exported_file.csv")

Check out your files after executing this line!

Now, let’s practice reading csvs by reading the file we just wrote back into R:

gap_asia_2007_in <- read_csv("FILL_THIS_IN")

Check out your R environment after executing this line!

Also notice the output of running read_csv. This tells us about the types of variables that were read in. It’s a good habit to check this every time you run a read_ function. Sometimes we might want to change how these variable types are specified.

Some notes:

By default in an R project, here::here() will be the project folder.
I don’t think you can go outside of your root folder for the R project, unless you re-initialize the root somehow using here::iam().
This does not change the working directory. However, we recommend against using setwd() and similar functions to play around with directories in R projects. This again affects reproducibility.

Resources

Video lecture: Reading and Writing Data
The “Writing and Reading files” chapter of stat545.com.

Learning Outcomes

Lecture Slides

Set-up

Data Formats

Comma Separated Values (CSVs)

File Locations and Paths

Absolute Paths

Relative Paths

The here Package

Resources

The `here` Package