How to import Excel file into R?

In business, perhaps the most widely used file format to store the data is the excel workbook. An excel workbook contains tables which can easily perform calculations. Just as CSV file can be imported using read.csv() function, Excel file can be imported using read_excel() function. In this article we will look at various options to import excel file into R. So let’s get started!

Although no built in function is provided to work with the excel file, there are several R packages designed to work seamlessly with the Excel data. We will make use of read_excel() function from readxl package to import the excel file. Lets start with installing and loading readxl package. We will also load tidyverse package to use some of its functions.

install.packages("readxl",dependencies = TRUE)
library(readxl)
library(tidyverse)

Create the excel file with three worksheets as follows:

Excel Sheet – 1

Excel Sheet – 2

Excel Sheet – 3

Save the excel file as employee_data.xlsx

We can import the excel file using read_excel() function by supplying excel file name with the extension. We will store the data into object named emp_data

emp_data <- read_excel('employee_data.xlsx')
emp_data

Employee_ID Employee_Name
<dbl>       <chr>
10    Rahul
20    Saket
30    Rohini

By default, the read_excel function imports first excel worksheet.

If we want to import specific worksheet, we can use sheet argument. If we want to import second worksheet, we will use sheet=2 in the read_excel function. We will import second workbook and store it in a object named emp_data1

emp_data1 <- read_excel('employee_data.xlsx',sheet=2)
emp_data1

Employee_ID Employee_qualification
<dbl>  <chr>
10     Engineer
20     CA
30     MBA

As you can observe, we have imported second worksheet from the excel file.

Before importing excel worksheet, we can also skip number of rows.

Let’s import the third worksheet and store it into object called emp_data2

emp_data2 <- read_excel('employee_data.xlsx',sheet=3)
emp_data2

this is employee data      ...2
<chr>                      <chr>
this is employee data   NA
this is employee data   NA
Employee_ID           Employee_Designation
10                   Manager
20                   VP
30                   Trainee

The initial few rows is a text data which we want to skip. This text data is not useful for our analysis, so we won’t import it. You may also notice that the first row is converted into column names. This also needs to be corrected.

We will use skip argument to skip the initial rows from worksheet. Since we want to skip 3 rows, we will pass skip=3 argument to read_excel() function. We will store the data in object called emp_data3

emp_data3 <- read_excel('employee_data.xlsx',sheet=3,skip=3)
emp_data3

Employee_ID  Employee_Designation
<chr>       <dbl>
10          Manager
20          VP
30         Trainee

Imagine you have 100s of excel worksheets in your excel file. Calling separate read_excel() function for all these worksheets shall be repetitive and cumbersome task. (you will have to write the same line of code 100 times) Can we automate this boring stuff?

The apply function comes to our rescue. We can create a character vector of all excel worksheets and pass it on to the read_excel() function. Using lapply function, we can loop over the read_excel() function on all the worksheets and import them one by one in a list.

This is how it works

work_sheet <- excel_sheets('employee_data.xlsx')
work_sheet

[1] "Sheet1" "Sheet2" "Sheet3"

The character vector work_sheet contains the name of worksheets in the excel file. We can use lapply function on this character vector to import all the worksheets in excel file.

emp_data_all <- lapply(work_sheet, function(X) readxl::read_excel('employee_data.xlsx', sheet = X))

emp_data_all

Excel Worksheets

emp_data_all consists list with three elements. Each list element is a data from a worksheet. Let’s check the data from worksheet 1

You can further append all the data into one spread sheet table using reduce function from purrr package. Using merge function and by supplying common column name we can merge all the separate excel worksheets together and produce one single consolidated excel spreadsheet.

Since we had to remove the initial rows from data, We will drop the third worksheet to maintain the consistency in column names.

lapply(work_sheet[-3], function(X) readxl::read_excel('employee_data.xlsx', sheet = X))%>%
  purrr::reduce(merge,by="Employee_ID")

Merged Excel worksheet

In this article we used read_excel function from readxl package to import the excel file. We also explored various options from read_excel function used while importing the data.
Finally we used lapply function along with read_excel to import multiple excel worksheet at once. We combined these worksheets into one excel spreadsheet using reduce function from purr package.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top