In R, for loop is used to repeat evaluating an expression with an iterator on list or vector. In practice, for loop is almost the last choice because an alternative way is much cleaner and easier to write and read when each iteration is independent of each other.

following code uses for loop to create a list of three independent normally distributed random vectors whose length is specified by vector `len`

```
len <- c(3,4,5)
x <- list() # create empty list
set.seed(123) # inititate random number generator
for(i in 1:3){
x[[i]] <- rnorm(len[i])
}
x
[[1]]
[1] -0.5604756 -0.2301775 1.5587083
[[2]]
[1] 0.07050839 0.12928774 1.71506499 0.46091621
[[3]]
[1] -1.2650612 -0.6868529 -0.4456620 1.2240818 0.3598138
```

The preceding example is simple but code is quite redundant compared to the implementation of `lapply`

```
set.seed(123)
lapply(len,rnorm)
[[1]]
[1] -0.5604756 -0.2301775 1.5587083
[[2]]
[1] 0.07050839 0.12928774 1.71506499 0.46091621
[[3]]
[1] -1.2650612 -0.6868529 -0.4456620 1.2240818 0.3598138
```

The `lapply`

version is much simpler. It applies `rnorm()`

on each element in `len`

and puts each result into a list.

This succinct code is only possible because R allows us to pass functions as an objects. The `rnorm`

function is passed into `lapply`

function just as any ordinary object argument. This feature largely boosts flexibility of coding.

Each apply family of functions are called as **Higher Order Functions** that accepts a function as an argument. There are several functions in `apply`

family, each performs specific task.

In this article we will look at important `apply`

family functions listed below. We will understand their usage, and limitations using simple examples.

- apply
- lapply
- sapply
- tapply

Let’s get started!

**apply function**

`apply`

function is higher order function that accepts function as an argument. It *applies* this function on rows or columns of data frame or matrix.

The data set below describes the height of five individual plants in inches at three different timelines (0 days, 10 days and 20 days). The first column is plant ID and each of the next three columns describe the plant height in inches at three different time points.

```
example_df <- data.frame(plant_ID = c("A", "B", "C", "D", "E"),
height_0 = c(15, 10, 12, 9, 17),
height_10 = c(20, 18, 14, 15, 19),
height_20 = c(23, 24, 18, 17, 26))
head(example_df)
```

We are interested in finding out `mean`

height at different stages of plant growth. We can either use *for* loop or use apply function to get the answer. If we compare the code structure, using `apply`

function makes the code compact and readable.

Let’s use the apply function on the data set to find out the mean values. The function `mean`

is passed on as an argument to `apply`

function and it is used across all columns of the data frame.

```
# drop first column since it is character vector
apply(example_df[-1],MARGIN=2,FUN = mean)
height_0 height_10 height_20
12.6 17.2 21.6
```

Here is a syntax of `apply`

function.

The first argument is object (data frame/matrix) you want to analyze

The second argument is `MARGIN`

. It specifies which dimension of data frame/matrix you want to analyse. It is used only for two dimensional objects.

Margin = 1 – indicates you want to analyse across data frames rows

Margin = 2 – indicates you want to analyse across data frames

The last argument is name of a function that will be applied to rows and columns.

Calculations in `apply`

function are carried out row-wise or column-wise, based on the Margin value you set up. In the above example, Margin=1 would produce different result.

```
apply(example_df[-1],MARGIN=1,FUN = mean)
[1] 19.33333 17.33333 14.66667 13.66667 20.66667
```

We can also pass the custom function to `apply`

. For e.g if we are interested in finding out at which stage of a plant growth its average height has passed above 15 inches. We can create custom function `is_tall`

to check the condition and pass it into `apply`

function.

```
is_tall <- function(x) {
value <- mean(x) > 15
return(value)
}
apply(example_df[,-1],MARGIN = 2, is_tall) # apply with custom function
```

This tells me that at time point 0, the plants are not taller than 15 cm on average, while the opposite is true for time points 10 and 20.

**lapply function**

One disadvantage of `apply`

function is that it does not work on lists. So, if we have list object to work on, we must use `lapply`

function.

Here is a simple list with two elements in it. If we wanted to calculate the average value for each list element, we could do it individually using mean function on each list element.

This method is pretty inefficient and makes us repeat our code. And what if we have more than, say 100 list elements? That would be a pain to type out. Let’s try another method.

We could create a for loop and save the results in a vector: This method is better because it automates the process, which would be especially useful if our list had a ton of elements. But for loops also take more time to run and construct, and still take up quite a bit of space in our code.

The last method is using `lapply`

function. Have a look at the code, we could able to wrap all the steps into a single line code.

Here is an example.

We will create a list called `plants`

, containing three elements that are each vectors with a length of ten. Each element in the list contains different plant attributes such as (height, mass, and # of flowers). We used uniform distribution to create a random numbers and used `sample`

function to generate random integers between 1 and 10.

```
plants <- list(height = runif(10, min = 10, max = 20),
mass = runif(10, min = 5, max = 10),
flowers = sample(1:10, 10))
plants
$height
[1] 12.81165 11.30546 12.79607 10.22552 12.28770 11.78231
[7] 17.53214 14.35947 11.67449 12.37116
$mass
[1] 5.423982 5.290442 6.548579 8.275295 6.344244 7.635298
[7] 9.136648 9.250006 7.958576 5.585793
$flowers
[1] 6 3 9 10 5 2 8 1 4 7
```

Using `lapply`

function to find out the `mean`

value of each list element.

```
lapply(plants,FUN = mean)
$height
[1] 12.7146
$mass
[1] 7.144886
$flowers
[1] 5.5
```

Please note, the output of `lapply`

function is always a list. Also we have not used `Margin`

argument in `lapply`

function, since the function `mean`

is applied to list elements.

**sapply function**

The output of `lapply`

function is always a list. If we want the output in a vector or in a matrix form, we can use `sapply`

function. The `sapply`

function works the same way as `lapply`

function. But instead of returning a list, it will return the answer in the simplest possible format.

```
sapply(plants,FUN = mean)
height mass flowers
12.714597 7.144886 5.500000
```

You can notice that the output type is simple numeric vector and not a list. you can confirm its data type using `class`

function.

```
class(sapply(plants,FUN = mean))
[1] "numeric"
```

**tapply function**

The `tapply`

function works in much the same way as the other functions, but it allows you to perform an operation across specified groups in your data. For those of you who are familiar with *dplyr* package, this does the same thing as the `group_by()`

and `summarise()`

functions.

Here is an example. We got a data set in which a service time to repair a product is recorded. We would like to find out the mean service time to repair for each individual product. So first we will have to group the data into individual products, and then find the mean value of service time for each group.

Let’s use `tapply`

function on `mtcars`

data set. This data-set comprises fuel consumption and various automobile parameters for 32 different car models. We want to determine the average fuel consumption in miles per gallon for different cylinder engines.

- group the data as per number of engine cylinders
- use
`mean`

function on mpg variable to find out the average fuel consumption.

`head(mtcars)`

We can perform above two steps using `tapply`

function.

```
# tapply function to calculate the average fuel consumption
# for different engine cylinders.
tapply(mtcars$mpg,INDEX = mtcars$cyl,FUN = mean)
4 6 8
26.66364 19.74286 15.10000
```

You can observe the trend here. A car’s mileage decreases as engine size increases.

Let’s decode the syntax of `tapply`

function.

The first argument is variable on which we want to perform calculation. So it is `mpg`

variable

The `INDEX`

will be the grouping variable. So we want to group the data using `cyl`

variable. The last argument is `mean`

which will calculate the `mpg`

average within each group.

**Summary**

`apply`

family of functions are higher order functions that accepts the other functions as an argument. These functions are *applied* to vectors, lists and on rows and columns of data frame or a matrix leading to concise and efficient code.

In this article we learned different `apply`

family of functions.

`apply`

function takes another function as an argument and *applies* it on rows or columns of data frame or a matrix.

`lapply`

function returns a list object.

`sapply`

function is used when the output is required in vector or matrix form.

`tapply`

function is used to group the variables together and apply operations on those groups.