Lecture 13: Lists, Iterations, and purrr

November 25, 2025

Modified

November 25, 2025

Learning Objectives

From this topic, students are anticipated to be able to:

  • Use the map family of functions from the purrr package to iteratively apply a function.

  • Create and operate on list columns in a tibble using nest(), unnest(), and the map_*() family of functions.

  • Define functions on-the-fly within a map function using shortcuts.

  • Create and operate on list columns in a tibble using nest(), unnest()

  • Apply list columns to cases in data analysis: columns of models, columns of nested lists (JSON-style data), and operating on entire groups within a tibble.

We will need to load in the tidyverse package for this lecture:

library(tidyverse)

Lecture Slides

Lists

Here is a list in R; it holds multiple items.

my_list <- list(1:10, c("a", "b", "c"), "samplestring")
my_list
[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10

[[2]]
[1] "a" "b" "c"

[[3]]
[1] "samplestring"

A list might sound like a vector, which we have worked with before – remember, we construct them using the c() function. Indeed, vectors and lists can both hold multiple items. But there are key differences.

Vectors Lists
Access elements with square brackets [] Access elements with [[]]
Each element must be an atomic data type (i.e., a single value) Elements can be anything, even another list or another vector
Each element has to be of the same type Elements can be as different as you like

Let’s take our sample list and access some items stored in it.

#returns the vector stored in the second element of the list
my_list[[2]] 
[1] "a" "b" "c"

To access the data within the vector, we can index it as well:

#returns third element of the vector stored in the second element of the list
my_list[[2]][3] 
[1] "c"

Elements within a list can also be named using names().

names(my_list) <- c("NumericVec", "CharacterVec", "String")
my_list
$NumericVec
 [1]  1  2  3  4  5  6  7  8  9 10

$CharacterVec
[1] "a" "b" "c"

$String
[1] "samplestring"

Once the elements are named, you can access them using the $ operator, similar to how you can grab columns from a data frame or tibble:

my_list$CharacterVec
[1] "a" "b" "c"

Tibbles, as a list?

Speaking of data frames and tibbles, did you know that data frames and tibbles are actually a special type of list? It’s true!

typeof(mtcars) 
[1] "list"
typeof(palmerpenguins::penguins)
[1] "list"

It turns out that they are actually lists, where each element of the list stores a column, which is either a list with the same number of entries as the tibble has rows, or a vector with the same number of entries as the tibble has rows.

This has an important implication: we can efficiently apply a function to each column of a tibble by learning how to apply a function to each entry of a list. This is yet another way (beyond functions themselves) of avoiding duplicating code, which you will recall (from the functions topic) has many advantages.

Iteration

If you programmed before, you probably have an idea of how to do this with a for loop. Here’s an example of a for loop in R that iterates over the entries of a numeric vector x, squares each entry, and stores the result in a numeric vector output:

x <- 1:10 
output <- vector("double", length(x)) #init a vector of 0s

for(i in seq_along(x)) { 
    output[i] <- x[i]^2  
}

output
 [1]   1   4   9  16  25  36  49  64  81 100

Iteration with purrr

Often, you can replace loops with a compact call to a function in the purrr package. This has the advantage of making our code even more readable and compact, since we’re expressing the same logic with less space.

There are many map_*() functions, a few of which we will highlight here:

  • map(): applies a function to each element in a list or vector, returns a list
  • map_dbl(): applies a function to each element in a list or vector, returns a vector
  • map_dfr(): applies a function to each element in a list or vector, returns a data frame
  • map2(): performs map() over multiple inputs simultaneously

The first argument of each function specifies the list/vector we want to iterate over, and the second argument specifies a function that we want to apply to each entry.

Let’s use these map functions over a list containing ages of some made-up people

sample_list <- list(18, 25, 63, 22)
names(sample_list) <- c("Amir", "Jenna", "Logan", "Phum")

sample_list
$Amir
[1] 18

$Jenna
[1] 25

$Logan
[1] 63

$Phum
[1] 22

We will compare the outputs of these functions when applying a simple square function to their ages.

map()

map(sample_list, function(x) x^2) #map a square function to the sample vector
$Amir
[1] 324

$Jenna
[1] 625

$Logan
[1] 3969

$Phum
[1] 484

A list is returned!

map_dbl()

map_dbl(sample_list, function(x) x^2) #map a square function to the sample vector
 Amir Jenna Logan  Phum 
  324   625  3969   484 

A vector is returned!

map_dfr()

map_dfr(sample_list, function(x) x^2) #map a square function to the sample vector
# A tibble: 1 × 4
   Amir Jenna Logan  Phum
  <dbl> <dbl> <dbl> <dbl>
1   324   625  3969   484

A dataframe is returned!

You will explore map2() in the Worksheet.

purrr Shortcuts

Here’s an example using purrr::map_dbl() and a custom function:

purrr::map_dbl(sample_list, function(x) x^2)
 Amir Jenna Logan  Phum 
  324   625  3969   484 

Options for specifying functions include the name of a function, a fully specified custom function (as demonstrated above), or one of the “shortcuts” the purrr developers have provided.

Here are two examples of “shortcuts”:

purrr::map_dbl(sample_list, ~ (.x)^2)
 Amir Jenna Logan  Phum 
  324   625  3969   484 
purrr::map_dbl(sample_list, \(x) x^2)
 Amir Jenna Logan  Phum 
  324   625  3969   484 

The second one is easier to remember and appears to be the one that purrr developers are recommending now; see the purrr cheatsheat. But this change in recommendation appears to have happened around 2022/2023, so you may still see the first type of shortcut in many places in the wild.

List Columns

Did you know columns in a tibble can have type “list”? We call these types of columns “list columns”.

Consider the following example: a snippet of the Game of Thrones data from An API of Ice and Fire.

## # A tibble: 6 × 3
##   name              gender titles   
##   <chr>             <chr>  <list>   
## 1 Theon Greyjoy     Male   <chr [2]>
## 2 Tyrion Lannister  Male   <chr [2]>
## 3 Victarion Greyjoy Male   <chr [2]>
## 4 Will              Male   <chr [1]>
## 5 Areo Hotah        Male   <chr [1]>
## 6 Chett             Male   <chr [1]>

Some characters have one title (e.g., Will); others have more than one title (e.g., Theon Greyjoy). Consequently, the titles column is a list column, where each entry is a list that contains as many or as few strings as we like.

We can even have tibbles as an observation in our data (which is technically still a list type).

Nested columns can have lists nested in lists, which we refer to as JSON-style data. You’ll see this in Worksheet B3!

Create List Columns using nest()

# Artificial dataset of families
family_data <- tibble(
  family_id = c(1, 1, 2, 3, 3, 3, 4),
  lastname    = c("Smith", "Smith", "Lee", "Patel", "Patel", "Patel", "LeBlanc"),
  child     = c("Mia", "Leo", "Noah",
                "Ella", "Sofia", "Lucas",
                "Harper")
)

family_data
# A tibble: 7 × 3
  family_id lastname child 
      <dbl> <chr>    <chr> 
1         1 Smith    Mia   
2         1 Smith    Leo   
3         2 Lee      Noah  
4         3 Patel    Ella  
5         3 Patel    Sofia 
6         3 Patel    Lucas 
7         4 LeBlanc  Harper

Instead of having one row per child, perhaps we want to nest the data such that there is only one row per family. We can nest the child column! (Aside: in general this data will not be tidy, but we’re demonstrating how to nest and un-nest data here).

We first need to group by family_id to gather children from each family, and then create a nested column called “children”:

nested_families <- family_data %>%
  group_by(family_id, lastname) %>%   # group by family
  nest(children = child) %>%           # create list-column "children" (tibble type)
  mutate(children = map(children, ~ .x$child)) #overwrite children: change from tibble to vector

nested_families
# A tibble: 4 × 3
# Groups:   family_id, lastname [4]
  family_id lastname children 
      <dbl> <chr>    <list>   
1         1 Smith    <chr [2]>
2         2 Lee      <chr [1]>
3         3 Patel    <chr [3]>
4         4 LeBlanc  <chr [1]>

Now we have a list column that contains vectors of varying lengths with the children’s names within each family.

Remove List Columns using unnest()

We can revert back to the original tibble using unnest().

original_family_data <- nested_families %>%
  unnest(children) #unnest the children column name

original_family_data
# A tibble: 7 × 3
# Groups:   family_id, lastname [4]
  family_id lastname children
      <dbl> <chr>    <chr>   
1         1 Smith    Mia     
2         1 Smith    Leo     
3         2 Lee      Noah    
4         3 Patel    Ella    
5         3 Patel    Sofia   
6         3 Patel    Lucas   
7         4 LeBlanc  Harper  

Resources

Video lectures:

Written material:

Want to dig deeper? These resources can help.

Back to top