Unraveling data types in R – The list – A gentle introduction

List of drugs signed by George Washington
Image source: Wellcome Collection
https://www.jstor.org/stable/community.24800219

In R, a list is an ordered collection of objects, like vector, but lists can actually combine objects of different types. List elements can contain any type of object that exists in R.

List is useful for its flexibility. For e.g. result of linear model fit in R is basically a list object that contains results of linear regression such as linear coefficients (a numeric vector), residuals (numeric vector), and other relevant information. Since these results are all packed into a list, it is very handy to extract the useful information for the analysis.

In this article, we will have a closer look at list objects. So, let’s get started with creating a list.

We can use list( ) function to create a list. This function can take arbitrary number of arguments depending on how much objects you want to pack into a list.

The following example shows how flexible lists can be. It shows a list that contains numerics, characters, logicals, matrices, and even other lists (these are known as nested lists)

list_temp <- list(1,
                  "A",
                  TRUE,
                  matrix(c(1:4),nrow=2),
                  list(F=2,G="B",H=FALSE))
list_temp

[[1]]
[1] 1

[[2]]
[1] "A"

[[3]]
[1] TRUE

[[4]]
     [,1] [,2]
[1,]    1    3
[2,]    2    4

[[5]]
[[5]]$F
[1] 2

[[5]]$G
[1] "B"

[[5]]$H
[1] FALSE

We can assign names to each list entry using named arguments.

list_temp <- list(A=1,
                  B = "A",
                  C = TRUE,
                  D = matrix(c(1:4),nrow=2),
                  E = list(F=2,G="B",H=FALSE))
list_temp

$A
[1] 1

$B
[1] "A"

$C
[1] TRUE

$D
     [,1] [,2]
[1,]    1    3
[2,]    2    4

$E
$E$F
[1] 2

$E$G
[1] "B"

$E$H
[1] FALSE

We can also assign a naming vector to name a list.

names(list_temp) <- c("name1","name2","name3","name4","name5")

list_temp

$name1
[1] 1

$name2
[1] "A"

$name3
[1] TRUE

$name4
     [,1] [,2]
[1,]    1    3
[2,]    2    4

$name5
$name5$F
[1] 2

$name5$G
[1] "B"

$name5$H
[1] FALSE

To remove the names, replace the names of list with NULL.

names(list_temp) <- NULL

list_temp

[[1]]
[1] 1

[[2]]
[1] "A"

[[3]]
[1] TRUE

[[4]]
     [,1] [,2]
[1,]    1    3
[2,]    2    4

[[5]]
[[5]]$F
[1] 2

[[5]]$G
[1] "B"

[[5]]$H
[1] FALSE

There are various ways to access elements of a list. The most common way is to use dollar-sign $ to extract the value of a list element by name:

names(list_temp) <- c("name1","name2","name3","name4","name5")

list_temp$name1 # extract first list elements
[1] 1

list_temp$name2 # extract second list elements
[1] "A"

list_temp$name3 # extract third list elements
[1] TRUE

Alternatively, we can supply a number in double square bracket [[ to extract the value of nth list member. For e.g. we can extract the value of second member of list list_temp, as follows:

list_temp[[2]] # Extract second element of a list
[1] "A"

We can also supply a name to extract the value of the list member with that name.

list_temp[["name4"]] # extract list elements by its name

     [,1] [,2]
[1,]    1    3
[2,]    2    4

We can extract multiple elements from a list using numeric, character or logical vector.

list_temp[c(1,3,5)] # Extract first, third and fifth element of a list

$name1
[1] 1

$name3
[1] TRUE

$name5
$name5$F
[1] 2

$name5$G
[1] "B"

$name5$H
[1] FALSE
list_temp[c("name1","name4")] # Extract first, fourth element of a list using character vector

$name1
[1] 1

$name4
     [,1] [,2]
[1,]    1    3
[2,]    2    4
list_temp[c(FALSE,FALSE,TRUE,TRUE,FALSE)] # Extract third, fourth element of a list

$name3
[1] TRUE

$name4
     [,1] [,2]
[1,]    1    3
[2,]    2    4

Remember, single square bracket [ ] is used for subsetting a list. Double square bracket [[ ]] is used for extracting one element from a list. We can not extract multiple elements from a list at a same time.

list_temp[[c("name1","name4")]]

Error in list_temp[[c("name1", "name4")]] : subscript out of bounds

Many functions in R are related to list. For e.g., if we are not sure whether an object is a list or not, we can use is.list( ) function.
Here, list_temp is a list, but list_temp$name3 is a logical vector rather than a list.

is.list(list_temp)
[1] TRUE

is.list(list_temp$name3)
[2] FALSE

We can use as.list( ) function to convert vector into a list.

list_temp1 <- as.list(c(1:3,23))
list_temp1

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 23

ulist( ) function coerce a list into a vector of compatible type:

temp_vec <- unlist(list_temp1)
temp_vec

[1]  1  2  3 23

As we saw in previous sections, lists are ideal for storing diverse data structures in a single container. Following use case will help us understand the usefulness of list in simplifying data management and analysis. Consider a scenario where you want to fit a machine learning model and want to store its outputs:

# Function to fit a model and store outputs
fit_model <- function(train_data, test_data){
  model <- lm(y~x, data=train_data) # Linear regression model


# compute predictions
predictions <- predict(model, newdata=test_data)

# Store output in a list
model_output <- list(fitted_model = model,
                     predictions = predictions,
                     coefficients = coef(model))
 return(model_output)
}

# usage of the function and storing outputs in a list
train_data <- data.frame(x=1:10,y=2*(1:10) + rnorm(10))
test_data <- data.frame(x=11:15)
model_results <- fit_model(train_data,test_data)
model_results

$fitted_model

Call:
lm(formula = y ~ x, data = train_data)

Coefficients:
(Intercept)            x  
     0.9862       1.8425  


$predictions
       1        2        3        4        5 
21.25379 23.09630 24.93881 26.78131 28.62382 

$coefficients
(Intercept)           x 
  0.9862288   1.8425059 

In this example model_results is a list containing the fitted model object, predictions on test data, and coefficients of the liner regression model. All the model results are stored together in single list making it easy to retrieve the results for further analysis and reference.

Lists are the most flexible data structures in R. They can contain any type of object that exists in R. Due to their flexibility, lists objects are widely used in storing model outputs, creating data base containing variety of data types and hierarchies, facilitating seamless data analysis and manipulation. In this article, we created a list, named a list and subset its various elements. In the end, we saw one use case in which list is used to store the regression model output.

Lists are very flexible, but they are sometimes difficult to work with due to their nested structures. In the next article, we will look upon most widely used data structure in R – the data frame.

Data frame is like a spreadsheet table. Since most the the data generated in business are in tabular form, data frame is incredibly useful object to work efficiently with tabular data. Let’s delve deeper into data frames in the next article.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top