In business, perhaps the most widely used file format to store the data is the excel workbook. An excel workbook contains tables which can easily perform calculations. Just as CSV file can be imported using read.csv()
function, Excel file can be imported using read_excel()
function. In this article we will look at various options to import excel file into R. So let’s get started!
Load the Packages
Although no built in function is provided to work with the excel file, there are several R packages designed to work seamlessly with the Excel data. We will make use of read_excel()
function from readxl
package to import the excel file. Lets start with installing and loading readxl
package. We will also load tidyverse
package to use some of its functions.
install.packages("readxl",dependencies = TRUE)
library(readxl)
library(tidyverse)
Create the Excel file
Create the excel file with three worksheets as follows:
Save the excel file as employee_data.xlsx
Import the Excel file
We can import the excel file using read_excel()
function by supplying excel file name with the extension. We will store the data into object named emp_data
emp_data <- read_excel('employee_data.xlsx')
emp_data
Employee_ID Employee_Name
<dbl> <chr>
10 Rahul
20 Saket
30 Rohini
By default, the read_excel
function imports first excel worksheet.
If we want to import specific worksheet, we can use sheet
argument. If we want to import second worksheet, we will use sheet=2
in the read_excel function. We will import second workbook and store it in a object named emp_data1
emp_data1 <- read_excel('employee_data.xlsx',sheet=2)
emp_data1
Employee_ID Employee_qualification
<dbl> <chr>
10 Engineer
20 CA
30 MBA
As you can observe, we have imported second worksheet from the excel file.
Skip the number of rows
Before importing excel worksheet, we can also skip number of rows.
Let’s import the third worksheet and store it into object called emp_data2
emp_data2 <- read_excel('employee_data.xlsx',sheet=3)
emp_data2
this is employee data ...2
<chr> <chr>
this is employee data NA
this is employee data NA
Employee_ID Employee_Designation
10 Manager
20 VP
30 Trainee
The initial few rows is a text data which we want to skip. This text data is not useful for our analysis, so we won’t import it. You may also notice that the first row is converted into column names. This also needs to be corrected.
We will use skip
argument to skip the initial rows from worksheet. Since we want to skip 3 rows, we will pass skip=3
argument to read_excel()
function. We will store the data in object called emp_data3
emp_data3 <- read_excel('employee_data.xlsx',sheet=3,skip=3)
emp_data3
Employee_ID Employee_Designation
<chr> <dbl>
10 Manager
20 VP
30 Trainee
Import multiple excel sheets
Imagine you have 100s of excel worksheets in your excel file. Calling separate read_excel()
function for all these worksheets shall be repetitive and cumbersome task. (you will have to write the same line of code 100 times) Can we automate this boring stuff?
The apply
function comes to our rescue. We can create a character vector of all excel worksheets and pass it on to the read_excel()
function. Using lapply
function, we can loop over the read_excel()
function on all the worksheets and import them one by one in a list.
This is how it works
work_sheet <- excel_sheets('employee_data.xlsx')
work_sheet
[1] "Sheet1" "Sheet2" "Sheet3"
The character vector work_sheet
contains the name of worksheets in the excel file. We can use lapply
function on this character vector to import all the worksheets in excel file.
emp_data_all <- lapply(work_sheet, function(X) readxl::read_excel('employee_data.xlsx', sheet = X))
emp_data_all
emp_data_all
consists list with three elements. Each list element is a data from a worksheet. Let’s check the data from worksheet 1
You can further append all the data into one spread sheet table using reduce
function from purrr
package. Using merge
function and by supplying common column name we can merge all the separate excel worksheets together and produce one single consolidated excel spreadsheet.
Since we had to remove the initial rows from data, We will drop the third worksheet to maintain the consistency in column names.
lapply(work_sheet[-3], function(X) readxl::read_excel('employee_data.xlsx', sheet = X))%>%
purrr::reduce(merge,by="Employee_ID")
Summary
In this article we used read_excel
function from readxl
package to import the excel file. We also explored various options from read_excel
function used while importing the data.
Finally we used lapply
function along with read_excel
to import multiple excel worksheet at once. We combined these worksheets into one excel spreadsheet using reduce
function from purr
package.